Summary
The authors present a vision-based system for simple posture recognition--a hand with 1-5 digits extended. They do skin color matching and segmentation to get a blob of the hand with normalized RGB values. These are then modeled as 2D Gaussians (chromaticity in r and g dimensions) and clustered. They choose the cluster corresponding to skin color defined with a certain number of pixels (min and max thresholds). Filtering is used to make the hand blobs continuous. They locate the center of the hand blob and using concentric circles with expanding radii, count the number of extended digits. This is their classification.
Discussion
If you have a tiny or huge hand, or the camera is zoomed in/out, your hand pixels may be too numerous/too sparse and not fall in their limits, so not get picked up correctly in the skin detection/hand tracking part of the algorithm.
Fingers must be spread enough for the concentric circle things to say there are two and not just 1.
I'd like details on how they find the center of the hand for their circles. I'd also like details on how they identify different fingers. For example, for their "click" gesture, do they just assume that a digit at 90 degrees to the hand that's seen/unseen/seen is a thumb moving? How do they sample the frames to get those three states?
First sentence: "less obtrusive." Figure 1: Crazy 50 pound head sucker thing. Give me my keyboard and mouse back.
BibTeX
@proceedings{storring2004visionGestureRecAR
,author="M. Störring and T.B. Moeslund and Y. Liu and E. Granum"
,title="Computer Vision-based Gesture Recognition for an Augmented Reality Interface"
,booktitle="4th IASTED International Conference on Visualization"
,address="Marbella, Spain"
,month="Sep"
,year="2004"
,pages="776--771"
}
No comments:
Post a Comment