Wednesday, February 20, 2008

Freeman - Television Control

Freeman, William T. and Craig D. Weissman. "Television Control by Hand Gestures." In Proceedings of the IEEE Intl. Wkshp. on Automatic Face and Gesture Recognition, Zurich, June, 1995.

Summary



Freeman and Weissman present a system for controlling a television (channel and volume) using "hand gestures." The hold up their hand in front of a television/computer combo. The computer recognizes an open hand using image processing techniques. When an open hand is seen, a menu opens with controls (buttons/slider bar) to control the channel and volume. They move their hand around and hover it over the controls to activate them. To stop, they close their hand or otherwise remove their open hand from the camera's FOV. They recognize an open palm using a cosine similarity metric (normalized correlation) between a pre-defined image of a palm and every possible offset within the image.

Discussion



Not in the mood to write decent prose, so here's a list.

  • Is natural language really that much better? First, it contains a lot of ambiguity that mouse/keyboard don't have. Second, you'd have just as many problems defining a vocabulary of commands using language as you would gestures, especially since there are so many words/synonyms/etc.

  • Their example of a 'complicated gesture' is a goat shadow puppet. Seriously? I think this is a little exaggerated and a lot ridiculous.

  • These aren't really gestures. It's just image tracking that boils down to nothing more than a mouse. What have you saved? Just buy 10 more remotes and glue them to things so you have one in every sitting spot and they can't be lost.

  • I don't know the image rec. research area, so I can't comment too much on their algorithm. But this seems like it would be super slow (taking all possible offsets) and have issues with scaling (what if the hand template is the wrong size, esp too small for the actual hand in the camera image).



BibTeX



@proceedings{freeman1995televisionGestures
,author="William T. Freeman and Craig D. Weissman"
,title="Television Control by Hand Gestures"
,booktitle="IEEE Intl. Wkshp. on Automatic Face and Gesture Recognition"
,address="Zurich"
,year="1995"
,month="June"
}

2 comments:

Grandmaster Mash said...

True, natural language has a lot of ambiguity. But it would still be nice for my computer/TV to understand my intention.

And no, their hand waving was not really a gesture. It was a single posture that changed positions.

Paul Taele said...

I really wanted to like this paper, and I think that it's still potentially viable to the gluing-ten-remote method (I'd stoll do it anyway :P). The idea for such a system was just too early for its time, given the infancy of vision-based methods. Probably would have been better to ditch the focus on the hands and go for a more mouse-like approach.