Wednesday, February 27, 2008

Sagawa - Recognizing Sequence Japanese Sign Lang. Words

Sagawa, H. and Takeuchi, M. 2000. "A Method for Recognizing a Sequence of Sign Language Words Represented in a Japanese Sign Language Sentence." In Proceedings of the Fourth IEEE international Conference on Automatic Face and Gesture Recognition 2000 (March 26 - 30, 2000). FG. IEEE Computer Society, Washington, DC, 434.

Summary



The authors present a method for segmenting gestures in Japanese sign language. Using a set of 200 JSL sentences (100 for training and 100 for testing), they train a set of parameter thresholds. The thresholds are used to determine borders of signed words, if the word is one- or two-handed, and distinguish transitions from actual words.

They segment gestures using "hand velocity," which is the average change in position of all the hand parts from one point to the next. Minimal hand velocity (when all the parts are staying relatively still) is flagged as a possible segmentation point (i.e., Sezgin's speed points). Another candidate for segmentation points is a cosine metric, which measures the inner product of a hand's elements at a current point compared to a window +- n points. If the change in angle is above a threshold, flagged as a candidate (i.e., Sezgin's curvature points). Erroneous velocity candidates are thrown out if the change velocity change from (t-n to t) or (t to t+n) is not great enough.

Determination of which hands are used (both vs. one hand, right vs. left hand) is done by comparing the hand velocities of the two hands, both on "which max is greater" (Eq 3) and "avg squared difference in velocity >? 1" (Eq 4). Thresholds are trained to recognize these values.

Using their stuff, they segment words correctly 80% of the time, and misclassify transitions as words 11% of the time. They say they are able to improve classification accuracy of words (78 to 87) and sentences (56 to 58).

Discussion



So basically they're using Sezgin's methods. I don't like all the thresholds. They should have done something more statistically valid and robust, since this requires extensive training and is very training-set dependent. Furthermore, different signs and gestures seem like they will have different thresholds, so training on the whole set will make them always get segmented wrong. I guess this is why their accuracy is less than stellar.

Basically, they just look at which hand is moving more, or if both hands are moving about the same, to tell one/two-handed and right/left-handed. Meh. Not that impressed.

BibTeX



@inproceedings{796189,
author = {Hirohiko Sagawa and Masaru Takeuchi},
title = {A Method for Recognizing a Sequence of Sign Language Words Represented in a Japanese Sign Language Sentence},
booktitle = {FG '00: Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition 2000},
year = {2000},
isbn = {0-7695-0580-5},
pages = {434},
publisher = {IEEE Computer Society},
address = {Washington, DC, USA},
}

No comments: