Saturday, May 10, 2008

Eisenstein - Device independence and extensibility

Eisenstein, J.; Ghandeharizadeh, S.; Golubchik, L.; Shahabi, C.; Donghui Yan; Zimmermann, R., "Device independence and extensibility in gesture recognition," Virtual Reality, 2003. Proceedings. IEEE , vol., no., pp. 207-214, 22-26 March 2003

Summary



They're trying to make a recognition framework that is device independent, so you can plug any data into it from any device. The first layer is the raw data, and each sensor value is application dependent. The second layer is a set of predicates that combine raw values into postures. The third layers is a set of temporal predicates that describe changes in posture over time. The fourth layers is a set of gestural templates that assign temporal patterns of postures to specific gestures.

Sensor values from layer 1 are mapped to predicates by hand for each device. Templates are added once, combining predicates. Use a bunch of neural networks to combine predicates into temporal data, and a bunch more neural networks to combine temporal data into gestures. Train all these networks with a whole lot of data.

Discussion



They omit stuff with temporal data from their experiments, so only ASL letters excluding Z and J. This is pretty cheesy. Also, they are shooting for device independence but you still have to map and train the connections between raw sensor values and predicates by hand, for each device. I understand you'd have to do this for any application, but it seems to defeat their purpose. I guess their benefits come at the higher levels, where you use predicates regardless of how they were constructed.

This seems crazy complicated for /very/ little accuracy. Using neural networks to classify the static ASL letters (all but Z and J), they only get 67\% accuracy. Other approaches are able to get close to 95-99% for the same data. I guess things are a little too complex.

Schlomer - Gesture Rec with Wii Controller

Schlomer, Poppinga, Henze, Boll. Gesture Recognition with a Wii Controller (TEI 2008).

Summary



Wiimote. Acceleration data. Quantize the (x,y,z) acceleration data using k-means into a codebook of size 14. Plug the quantized data in to a left-right HMM. Segment gestures by making the user press and hold the A button during a gesture. They can recognize about 90% of all gestures accurately (circle, square, tennis swing).

Discussion



The point of this is.... what? Wii games, like Wii tennis and bowling, are pretty darned accurate at this already.

Murayama - Spidar G&G

Summary



We have a 6 DOF haptics device called the SPIDAR G that provides force feedback. Let's put two of them together, one for each hand. We'll make people do different things, like use one hand to manipulate a target and another to manipulates an object with have to put into the target. If they run into something, we'll give them feedback to say they hit something. When they use one Spidar as opposed to just 2, they can usually do things faster.

Discussion



So all they do is take one machine and hook another up and use the two of them. And using two things, users can tend to do things a little bit faster. This is pretty obvious, since instead of just moving the object, I can now also move the target. It saves the amount of work I have to do.

I wonder if the strings of the Spidar would get in your way and limit your movement. Surely they would. Rotation would also be tough because you can't hold onto something and rotate it more than about 180 degrees.

No real evaluation performed, just a little bit of speedup data.

Lapides - 3D Tractus

Lapides, P., Sharlin, E., Sousa, M. C., and Streit, L. 2006. The 3D Tractus: A Three-Dimensional Drawing Board. In Proceedings of the First IEEE international Workshop on Horizontal interactive Human-Computer Systems (January 05 - 07, 2006). TABLETOP. IEEE Computer Society, Washington, DC, 169-176. DOI= http://dx.doi.org/10.1109/TABLETOP.2006.33

Summary



I want to draw in 3D, but all I have is this tablet PC. I know, I'll put the table PC on a table that moves up and down, simulating the third dimension. I'll make a drawing program with a user interface that shows the 3D drawing from a few angles so users won't get too confused. We'll use visual cues like depth with line width. Also, we'll use perspective projection so the user knows which things are 'below' the current plane of the tablet PC.

People used it and said it was neat.

Discussion



This drawing/3D modeling approach is a little more realistic than other things we've read (Holosketch or the superellipsoid clay thing) since all you need is a tablet and one of their funky elevator things. So I'll give it kudos there.

I'm not sure how simple or intuitive it is, however, to have to move your tablet pc up and down to draw in the third dimension. I question the accuracy, especially if you're trying to line things up one on top of another, etc, since you don't really have a good idea of where things are in the Z axis. This is especially hard if what you're trying to draw exists above the current plane of the tablet PC, since their software only shows you what's below the current plane.

Neat idea, but a bit klunky. Hope someone doesn't get their legs cut off if the tractus goes berserk.

Krahnstoever - Activity Recognition with Vision and RFID

Krahnstoever, N.; Rittscher, J.; Tu, P.; Chean, K.; Tomlinson, T., "Activity Recognition using Visual Tracking and RFID," Application of Computer Vision, 2005. WACV/MOTIONS '05 Volume 1. Seventh IEEE Workshops on , vol.1, no., pp.494-500, 5-7 Jan. 2005

Summary



Person in an office or warehouse with cameras on them. Track their movements with a monte carlo model examining the image frames. Augment this with RFID tags embedded in all the objects the human can interact with. Do activity recognition by examining how the person is moving (vision) and what they are interacting with (rfid). RFID helps augment visual tracking for the purposes of activity recognition.

Discussion



So they take an existing Monte Carlo visual tracking algorithm and magically throw RFID in to the jar. They say this does better. Sort of a "duh" moment. Why do we let Civil Josh pick papers?

Bernardin - Grasping HMMs

Bernardin, K., K. Ogawara, et al. (2005). "A sensor fusion approach for recognizing continuous human grasping sequences using hidden Markov models." Robotics, IEEE Transactions on [see also Robotics and Automation, IEEE Transactions on] 21(1): 47-57.

Summary



I have a robot that I want to teach to grab things. I can teach it by example. I have 14 different types of grips that I use everyday. I'll put pressure sensors in a glove under the CyberGlove. I will grab something, and then let it go. All of this data will be fed into an HMM in the HTK speech recognition toolkit. The HMM will tell me which grasp I am making with up to about 90% accuracy.

Discussion



Pretty neat. If you know what you're grasping, you can do things like activity recognition and such. Especially helpful when you start using smart rooms and offices, etc. Maybe even Information Oriented Programming (IOP)!

I think the pressure sensors really helped augment the CyberGlove, especially since there were so many grasp categories.

Nishino - Object modelling with gestures

Nishino, H., Utsumiya, K., and Korida, K. 1998. 3D object modeling using spatial and pictographic gestures. In Proceedings of the ACM Symposium on Virtual Reality Software and Technology (Taipei, Taiwan, November 02 - 05, 1998). VRST '98. ACM, New York, NY, 51-58. DOI= http://doi.acm.org/10.1145/293701.293708

Summary



Put on special glasses to get a 3d stereoscopic image from a curved screen, and put glove/motion tracker on your hands to track them. Have some virtual clay modelled by a superellipsoid (it's mathematically easy to work with, relatively). Create a blob, deform it, mash it, pinch it, stretch it, put it in a pan, bake it up as fast as you can. Combine a bunch of blobs to make things like teapots, vases, and bigger blobs.

Discussion



Good for professional sculptors who might want to fashion something without wasting real clay. But, since clay is easy to recycle (just add water), who cares. If you're not a sculptor, are you good enough with your hands to make your blobs of junk look like things in real life? How accurate are the hands, so a noisy spike doesn't accidentally mash your teapot into oblivion?

Pretty neat idea, just not sure of its usefulness.

Campbell - Invariant Features, Tai-Chi

Campbell L W, Becker D A, Azarbayejani A, Bobick A F, and Pentland A, Invariant Features for 3-D Gesture Recognition, Proc. of FG'96 (1996) 157-162.

Summary



Look at a series of gestures occurring in Tai-Chi captured by video. Extract a lot of features about the gestures, including plain (x,y,z) coordinates, velocities for these coords, polar coordinates, polar velocities. Do each of these with and without head data (always with hand data). Plug all the different sets of features into an HMM and see which feature set does the best. Polar velocity with no head does the best at about 95% accuracy overall. Plain (x,y,z) does the worst at about 34% overall.

Discussion



Just take a bunch of features and string them all together. Perform a standard feature extraction/selection algorithm. Get a set of features that probably outperforms all your sets.

Win.

This paper isn't interesting, really, as it just shotguns a bunch of features into an HMM and see who wins.

Fail.

Wesche - FreeDrawer

Wesche, G. and Seidel, H. 2001. FreeDrawer: a free-form sketching system on the responsive workbench. In Proceedings of the ACM Symposium on Virtual Reality Software and Technology (Baniff, Alberta, Canada, November 15 - 17, 2001). VRST '01. ACM, New York, NY, 167-174. DOI= http://doi.acm.org/10.1145/505008.505041

Summary



Electronic pen you can use to draw in 3D space. To make things simpler for their algorithm, you're restricted to spline curves. You trace out the general curve with the pen and the computer calculates the parameters of the spline. You can draw curves, modify them, connect curves together to form a network, fill in surfaces between curves. You wear wonky VR goggles to see what you're drawing.

Discussion



Tradeoff between user freedom (virtual clay) and performance--they choose performance by limiting a user's drawing style (restricted to splines). They claim this is easy because it has closed form representation, is easily transferable (just the parameters of the splines and not every voxel need to be transmitted), and computationally cheap (storing every voxel for virtual clay is expensive).

They admit you need an artistic flair and a little bit of training to get used to using the splines. Well then why not just train on a CAD system? Isn't the point to offer an intuitive interface with no need for training or restrictions? Plus, if you use CAD, you don't have to use /just/ splines, can be precise and exact, and don't have to wear wonky 3D goggles.

Poddar - Gesture Spech, Weatherman

I. Poddar, Y. Sethi, E. Ozyildiz, R. Sharma. Toward Natural Gesture/Speech HCI: A Case Study of Weather Narration. Proc. Workshops on Perceptual User Interfaces, pages 1-6, November, 1998.

Summary



Three categories of gestures: pointing, area, and contour, each with three phases: preparing, making the gesture, and retraction. Use features that measure distances/angles between the face and hands and plug into an HMM. Get 50-60% accuracy on four test sequences.

Now add speech to the gesture data. Compute co-occurrences of marker words with different gestures and use the data to help the HMM classify gestures. Accuracy goes up about 10%.

Discussion



Adding speech to gesture data improves the accuracy. This is fairly obvious, and they've shown that it does a little bit. The one thing I don't like is the manual labeling of speech data.

I wish they would have done more gestures, and their accuracies weren't great. But at least it was a fusion of contextual data.