Monday, February 11, 2008

Song - Forward Spotting Accumulative HMMs

Daehwan Kim; Daijin Kim, "An Intelligent Smart Home Control Using Body Gestures," Hybrid Information Technology, 2006. ICHIT'06. Vol 2. International Conference on , vol.2, no., pp.439-446, Nov. 2006

Summary



Song and Kim present an algorithm for segmenting a stream of gestures and recognizing the segmented gestures. They take a sliding window of postures (observations) from the stream and feed them into a HMM system that has one model per gesture class, and one "Non-Gesture" HMM. They say that a gesture has started if the max probability from one of the gesture HMMs is greater than the probability of the Non-Gesture HMM. They call this the competitive differential observation probability, which is the difference between the max gesture prob and the non-gesture prob (positive means gesture, negative means non-gesture, and crossing 0 means starting/ending a gesture).

One a gesture is observed to have started, they begin classifying the gesture segments (segmenting the segmented gesture). They feed the segments into the HMMs and get classifications for each segment. Once the gesture is found to have terminated (the CDOP drops below 0, or the gesture stream becomes a non-gesture), they look at the classification results for all the segments and take a majority vote to determine the class for the whole gesture.

So we have a sliding window. Within that window, we decide a gesture starts and later see that it ends. Between the start and end points, we segment the gesture stream further. Say there are 3 segments. Then we'd classify {1}, {12}, and {123}. Pretend {1} and {123} were "OPEN CURTAINS" and {12} was "CLOSE CURTAINS." The majority vote, after the end of the gesture, would rule the gesture as "OPEN CURTAINS."

They give some results, which seem to show their automatic method performs better than a manual method, but it's not clear what the manual method is. They seem to get about 95% accuracy classifying 8 gestures made with the arms to open/close curtains and turn on/off the lights.

Discussion



So basically they just use a probabilistic significance threshold to say if a gesture has started or not, as determined by the classification of an observation as a non-gesture (like Iba's use of a wait state when recognizing robot gestures). So don't call it the CDOP. Call it a "junk" class or "non-gesture" class. They made it much harder to understand than it is.

When they give their results in Figure 5 and show the curves for manual segmentation, what the heck does \theta mean? This wasn't explained and makes their figure all but useless.

So this seems like a decent method for segmenting gestures...10 years ago. Iba had almost the exact same thing in his robot gesture recognition system, and I'm sure he wasn't the first. Decent results, I think (can't really interpret their graph), but nothing really noteworthy.

The only thing they do differently is do a majority vote from the sub-segmentation of their segmenting. Yeah, confusing. I'm not sure how much this improves recognition, as they did not compare with/without it. It seems to me like it would only take up more computation time for gains that weren't that significant.

BibTeX


@ARTICLE{song2006forwardSpottingAccumulativeHMM,
title={An Intelligent Smart Home Control Using Body Gestures},
author={Daehwan Kim and Daijin Kim},
journal={Hybrid Information Technology, 2006. ICHIT'06. Vol 2. International Conference on},
year={Nov. 2006},
volume={2},
number={},
pages={439-446},
doi={10.1109/ICHIT.2006.253644},
ISSN={}, }

3 comments:

Brandon said...

I agree - note sure sub-segmenting is worth it and I wonder if the system runs in real-time. They didn't say...

Grandmaster Mash said...

My gut feeling is that sub-segmenting would be more likely to find the correct gesture, but I agree that it would have been nice to see results. I'd also like them to explain their manual versus automatic thresholding, and why they even bothered to mention that they manually took thresholds if automatic thresholding works better. Save the space and make the figures actually readable...

Paul Taele said...

From what I got from the paper, there's no way this system could be done in real-time. I felt that there are too many components incorporated in their system (partial GR, majority voting) to make this system not crawl in real-time. Also, I didn't pick up the similarity to the Iba paper, but now I'm not as impressed with the paper.