Summary
Three categories of gestures: pointing, area, and contour, each with three phases: preparing, making the gesture, and retraction. Use features that measure distances/angles between the face and hands and plug into an HMM. Get 50-60% accuracy on four test sequences.
Now add speech to the gesture data. Compute co-occurrences of marker words with different gestures and use the data to help the HMM classify gestures. Accuracy goes up about 10%.
Discussion
Adding speech to gesture data improves the accuracy. This is fairly obvious, and they've shown that it does a little bit. The one thing I don't like is the manual labeling of speech data.
I wish they would have done more gestures, and their accuracies weren't great. But at least it was a fusion of contextual data.
No comments:
Post a Comment