Summary
Vision based neural network posture recognition (20 static postures). Manual segmentation is used, and the recording is rigged to be nearly perfect postures. The hands/fingers are tracked by VICON system (3D coordinates) and filtered. The features that are computed are the distances between different landmarks on the hand, normalized to account for varying hand sizes. Features are fed into a neural network with 2 hidden layers, each with 250 hidden units, trained for 3000 epochs or until root mean squared error < 0.0001. About 95% accuracy.
Discussion
Holy overtraining, Batman! That's a lot of hidden units! Especially for a problem this set up to be easy...static postures...practiced until it was perfect...recognition should be close to perfect. Just do template matching. Also, don't include your training data in your test set.
Your features must really suck if you can't get closer to 100% on this problem (like > 99%). Even if you do pixel-by-pixel template matching, you should get pretty darn close to 99%. Heck, even handwritten digit recognition is close to 100%.
2 comments:
At the time of reading this paper, I wasn't impressed with the authors implementing a straight NN on this domain, as all it did was use some popular machine learning on a less-commonly studied sign language dialect. This paper is made less impressive on the problems you highlighted in comparison to template matching.
Their features weren't that good, since they could not distinguish between closed hand and slightly open hand. The problem is that they were using only vision techniques and had a bounding box for the hand, so a slightly open hand can't be seen.
Post a Comment