Wednesday, February 13, 2008

Kim - Gesture Rec. for Korean Sign Language

Jong-Sung Kim; Won Jang; Zeungnam Bien, "A dynamic gesture recognition system for the Korean sign language (KSL)," Systems, Man, and Cybernetics, Part B, IEEE Transactions on , vol.26, no.2, pp.354-359, Apr 1996

Summary



Kim et al. present a system for recognizing a subset of gestures in KSL. They say KSL can be expressed with 31 distinct gestures, choosing 25 of them to use in this initial study. They use a Data-Glove, which gives 2 bend values for each finger, and a Polhemus tracker (normal 6 DOF) to get information about each hand.

They recognize signs using the following recipe:

  1. Bin the movements along each axes (bins of width=4inches) to filter/smooth

  2. Vector quantize the movements of each hand into one of 10 "directional" classes that describe how the hand(s) is(are) moving.

  3. Feed the glove sensor information into a fuzzy min-max neural network and classify which of 14 postures it is, using a rejection threshold just in case it's not any of the postures

  4. Use the direction and posture results to say what the sign is intended to be



Discussion



They make the remark that many of the classes are not linearly separable. This is a problem in many domains. Support vector machines can sometimes do a very good job at separating data. I wonder why no one has used them so far. Probably because they're fairly complex.

I also like the idea of thinking as gestures as a signal. I don't know why, but this analogy has escaped me so far. There is a technique for detecting "interesting anomalies" in signals using PCA. I wonder if this would work in the segmentation problem?

How do they determine the initial position for the glove coordinates? If they get it wrong, all their measurements will be off and the vector quantization of their movements will probably fail. They should probably just skip this whole initial starting point thing and use change from the last position. Maybe that's what they really mean, but it's unclear.

Also, is seems like their method for filtering/smoothing the position/movement data by binning the values is a fairly hackish technique. There are robust methods for filtering noisy data that should have been used instead.

And finally for their results. They say 85%, which doesn't seem /too/ bad for a first try. But then they try to rationalize that 85% is good enough, saying that "the deaf-mute who [sic] use gestures often misunderstand each other." Well that's a little condescending, now, isn't it? And they also blame everything else besides their own algorithm, and "find that abnormal motions in the gestures and postures, and errors of sensors are partly responsible for the observed mis-classification." So you want things to work perfectly for you? You want life to play fair? News flash: if things were perfect, you would be out of the job and there would be no meaning or reason for the paper you just wrote. Things are hard. Deal with it. Life's not perfect, my sensors won't give me perfect data, and I can't draw a perfectly straight line by hand. That's not an excuse to not excel in your algorithm and account for those imperfections.

Also, how did they combine the movement quantized data (the ten movement classes) with the posture classifications? Postures were neural nets, not the combination, right?

BibTeX



@ARTICLE{485888,
title={A dynamic gesture recognition system for the Korean sign language (KSL)},
author={Jong-Sung Kim and Won Jang and Zeungnam Bien},
journal={Systems, Man, and Cybernetics, Part B, IEEE Transactions on},
year={Apr 1996},
volume={26},
number={2},
pages={354-359},
keywords={data gloves, fuzzy neural nets, pattern recognitionKorean sign language, data-gloves, dynamic gesture recognition system, fuzzy min-max neural network, online pattern recognition},
doi={10.1109/3477.485888},
ISSN={1083-4419}, }

2 comments:

Brandon said...

wow that's kind of harsh. not saying it's wrong... just harsh lol.

i agree that this paper has been like a few others we have read - the ideas behind seem decent but the execution was done very poorly.

Paul Taele said...

So, I'll just focus on the SVM comment. You brought up something that has been bugging me for quite some time. I hope we come across a paper that uses an SVM approach or tells us why the SVM approach is impractical for hand gesture recognition. It appears that the not linearly separable remake in the paper wasn't really resolved that much in this paper.