Summary
Iba et al. describe a system that uses gestures collected from a CyberGlove (for finger/hand position) and Polhemus (hand tracking in six degrees of freedom) and recognized using a hidden Markov model to control a robot. Their argument for using a glove-based interface is that it can be a more intuitive method for controlling robot movement, etc. Not necessarily for one robot, as a joystick can work with a higher degree of accuracy. Their primary claim is that for groups of robots, where controlling each individual robot becomes intractable and burdensome, are easily controlled as a group using gestures such as pointing and general motion commands. The commands were open, flat hand to continue motion, pointing to 'go there', wave left or right to turn that direction, and closed fist to stop.
Their hardware samples finger and wrist position and flexion at a rate of 30 HZ. The data gathered is sent to a preprocessor, with the 18 data points offered by the glove undergoing linear combinations to 10 values and then augmented with the first derivatives (change from the last point in time). The 20 dimensional vectors are vector quantized into a codeword. The codeword represents a coarse-grained view of the position/movement of the fingers/hand, with the codewords trained off-line. 'Postures', then, become codewords, and gestures are sequences of codewords.
The last n codewords are fed into an HMM, which contains a method for rejecting (a 'wait' state branching to the HMM for each gesture), and the gesture is classified to the HMM that gives the highest probability (forward algorithm).
They test their algorithm with an HMM both with and without the wait state to show that the wait state helps to reject false positives, which is of concern because you don't want the robot to move if you don't mean it to. Whereas for a false negative, the gesture can simply be repeated. With the wait state, they got 96% true positives, with only 1.6/1000 false positives. Without the wait state, they got 100% true positives but 20/1000 false positives.
Discussion
How did they come up with the best linear combination to use when reducing the glove data from 18 values to 10?
I would like to see details on how they created their codebook. They say they covered 5000 measurements that are representative of the feature space, but the feature space in this case is huge! Say each of the 18 sensors have 255 values each. The 6 DOF of the hand tracker are three angular measurements, with 360 values (assuming integral precision), and three real valued dimension measurements. Say the tracker is accurate to the inch, and the length of the cord is ten feet. Let's make it easier for them and say you can only stand in front of the tracker, and not behind, so that's a ten foot half sphere with volume (1/2)*(4/3)*PI*(10^3) = 4/6 * 3 * 1000 = 2000 cubic feet. Let's cut that in half because some of the sphere is inaccessible (goes into the floor or above the ceiling), so 1000 square feet, or 1000 * 12^3 = 1.728e6 square inches. So the number of possible values for the entire space of possible values coming from the hardware is (18*255)*(3*360)*(3*1.728e6). My math is probably off, and so are the assumptions of the values of the ranges, but even if I'm off by 3 orders of magnitude, that's still a WHOLE FRIGGIN LOT MORE THAN 5000 POSITIONS. Now, how did they 'cover the entire space' adequately? Maybe they did, I don't know, but I'm skeptical. I suppose my beef is with their claim that they cover the ENTIRE SPACE. I doubt it.
Something like multi-dimensional scaling might tell you which features are important. Or you could use an iterative, interactive process for creating new codewords. Something like starting with their initial book, and then for each quantized vector (or a random sampling), seeing if it is 'close enough' to the others, or fits into a cluster with 'high enough' probability (if your codeword clusters were described by mixtures of Gaussians, the codeword being the means). If it's not good enough, start a new cluster. Maybe they did something like this, but they didn't say.
So aside from those two long, preachy paragraphs, I really liked this algorithm. Quantizing the codewords means your HMM only has to deal with a certain number (32) of different inputs, making them discretized and easier to train, and you know exactly what to expect.
BiBTeX
@ARTICLE{iba1999gestureControlledRobots,
title={An architecture for gesture-based control of mobile robots},
author={Iba, S.; Weghe, J.M.V.; Paredis, C.J.J.; Khosla, P.K.},
journal={Intelligent Robots and Systems, 1999. IROS '99. Proceedings. 1999 IEEE/RSJ International Conference on},
year={1999},
volume={2},
number={},
pages={851-857 vol.2},
keywords={data gloves, gesture recognition, hidden Markov models, mobile robots, user interfacesHMM, data glove, gesture-based control, global control, hand gestures, hidden Markov models, local control, mobile robots, wait state},
doi={10.1109/IROS.1999.812786},
ISSN={}, }