Summary
Li and Prabhakaran propose a new gesture classification algorithm that is easily generalizable to many input methods. They use the SVD of a motion matrix. A motion matrix has columns that are the features of the data (like the joint measurements from a CyberGlove) and rows that are steps through time. SVD is a mathematical procedure that produces a set of eigenvectors and eigenvalues for the matrix (they use matrix M = A'A, where A is the motion matrix, for computational efficiency). The top k eigenvectors are used (a parameter, with empirical evidence supporting k=6 as enough to perform well). To compare to motion matrices, the eigenvectors are compared with their dot product (angle between the vectors), weighted by the ratio of the eigenvalues for those vectors. A value of 0 means that the matrices have nothing in common, as all the eigenvectors are orthogonal. A value of 1 means the matrices have collinear eigenvectors. They call this kWAS, k weighted angular similarity (for k eigenvectors, and the weighted dot product/cosine metric).
Their algorithm works as follows. Start with a library of matrices and compute the eigenvectors/values for them. Start watching the stream of incoming data, segmenting it with minimum length l and max length L, stepping through the stream with steps size \delta. Look at all the chunks in the stream, call the matrices Q, and compare them to all the P. The Q,P pairing that has the highest kWAS score is selected as the correct answer, and the classification starts from the end of the segment with the max score.
They report that their algorithm can recognize CyberGlove gestures (not clear if it's isolated patterns or streams) with 99% accuracy with k=3, and in motion capture data with 100% accuracy with k=4. These figures aren't clear as to what they mean, however.
Discussion
So their method isn't really for segmentation. They still just look at different sliding windows of data and pick one that works. It works well without the use of holding positions or neutral states, as many other systems impose on users to delineate gestures. However, Iba et al's system can do the same thing using hidden Markov models with a built in wait state.
However, as far as a new classification approach is conerened, this is a nice approach because it seems to give decent results and is not another HMM.
They never say how they pick delta. I wonder how different values affect accuracy / running time of the algorithm.
Some people might be concerned with the fact that once you do the eigenvectors, you lose temporal information. I can see where this would be a concern for some things. However, most of the time you can get good classification/clustering results without the need for perfect temporal information. It can even be the case that temporal information tends to confuse the issue, making things hard to compute and compare.
BibTeX
@inproceedings{1133901,
author = {Chuanjun Li and B. Prabhakaran},
title = {A similarity measure for motion stream segmentation and recognition},
booktitle = {MDM '05: Proceedings of the 6th international workshop on Multimedia data mining},
year = {2005},
isbn = {1-59593-216-X},
pages = {89--94},
location = {Chicago, Illinois},
doi = {http://doi.acm.org/10.1145/1133890.1133901},
publisher = {ACM},
address = {New York, NY, USA},
}
2 comments:
Hi there,
I see you are taking the 'gesture segmentation' problem quite seriously. Let me know when you solve it okay :-)
Did you check this one:
Simultaneous gesture segmentation and recognition based on forward
spotting accumulativeHMMs
By Kim, SOng and Kim, 2007
Oh, neat. Yet another non-classmate posting to your blog. Your blog sure is popular with the outsiders.
Back on topic, I agree on your reasoning of having temporal info complicating matters. If temporal information were really that important to a particular domain, I wonder if this technique can be augmented by another technique that uses those temporal data from the raw data itself? Hmm...
Post a Comment