Summary
Use accelerometers and gyroscopes to get data on moving hand. Compute 2D acceleration vectors in XY, XZ, and YZ planes. One feature is the sum of changes in acceleration, another is the rotation of acceleration, and the third feature is the aspect ratio of the two unit components of each 2D acceleration vector (which acceleration component is larger). 8 more features gives the distributions of acceleration over eight principal directions with separation pi/4. These 11 features are computed for each of the three planes, giving 33 features per gesture. The mean and standard deviation for the features are computed, and classification is performed to the gesture with the lowest weighted error (sum of squared difference from mean divided by standard deviation).
They look at data to see where maxima in acceleration occur, representing places where a conductor changes direction, marking off tempo beats. To try and smooth the computer's performance with respect to changing/noisy tempo beats made by a human, the system uses prediction to guess the next set of tempo. A parameter can be set to change the system's reliance on the human compared to its ability to smooth out noisy tempo beats (linear prediction).
Discussion
They don't really explain their features well. Furthermore, they give this whole thing about rotation feature and then say they don't use it. Well big deal, then. Why list rotation as a feature?
They're note doing gesture recognition, just marking tempo beats using changes in acceleration. They don't need 33 features for this. They need 3--acceleration in X, Y, and Z. The rest are linearly dependent on the data. They can predict tempo fairly accurately, but I'm not that impressed.
No comments:
Post a Comment