Wednesday, February 6, 2008

Lee - Interactive Learning HMMs

Lee, Christopher, and Yangsheng Xu. "Online, Interactive Learning of Gesture of Human/Robot Interfaces."

Okay, right off the bat, this paper has nothing to do with robots. Why put it in the title?

Summary



Lee and Xu present an algorithm for classifying gestures with HMMs, evaluating the confidence of each classification, and using correct classifications to update the parameters of the HMM. The user has to wait for a bit between gestures to aid in segmentation. To simplify the data, they use fast Fourier transforms (FFTs) on a sliding window of sensor data from the glove to collapse the window. They then feed the FFT results to vector quantization (using an off-line codebook generated with LBG) to collapse the vector to a one dimension symbol. The series of symbols are fed into HMMs, one per class, and the class with the highest Pr(O|model) is selected as answer. The gesture is then folded into the training set for that HMM and the parameters are updated.

They also introduce a confidence measure for analyzing their system's performance, which is the log of the sum of the all ratios of an incorrect HMMs prob for a gesture / the corrent HMMs prob for a gesture. If a gesture is classified correctly, the correct HMM will have a higher prob than all the incorrect HMMs and all the ratios will be < 1, meaning the log of the sum of them will be < 0. If all the probabilites are about the same, the classifier is unsure and the ratios will all be around 1, meaning the log will be around 0. They show that starting with one training example, they achieve high and confident classification after only a few testing examples are classified and used to update the system.

However, they're only using a set of ASL letters that are "amenable to VQ clustering."

Discussion



I do like the idea of online training an updating of the model. However, after a few users, you lose the benefit so it's just better to have a good set of training data that's used offline before any recognition takes place, simplifying your system and reducing workload.

I don't like that you have to hold your hand still for a little bit between gestures. I would have liked to seen a system like the "wait state" HMM system discussed in Iba, et al. "An architecture for gesture based control of mobile robots." I'd like to see a better handle on the segmentation problem. They do mention using acceleration.

Their training set is too small and easy, picking things that are "amenable to VQ clustering", so I don't give their system much credit.

1 comment:

Test said...

you don't "give their system much" what?

I liked your comment about the title. Funny and true.