Summary
Wobbrock et al. present a very simple single-stroke recognizer that uses a template-based approach, named because it's cheap and easy to create. Their method works in four basic steps.
First, the stroke is resampled. Given a parameter N, the number of points in the resampled stroke, and the stroke length L, $1 interpolates N points (possibly new) along the original stroke path spaced L/N units apart. This ensures a uniform representation for fast/slow strokes to be compared to each other (fast strokes are less dense than slower strokes of the same length).
Second, the resampled stroke is rotated so that comparisons are more or less rotationally invariant. The angle of rotation is computed from the line segment connecting the center of the stroke's bounding box and the first point of the stroke, rotated to the horizontal axis at 0 degrees.
Third, the resampled and rotated points are scaled so that the bounding box is equal to a reference square of a certain size, disregarding original aspect ratio. The points are then translated so that the centroid (mean along x and y) is moved to the origin (0,0). These points are meant to make the strokes more scale invariant.
Fourth, the stroke is compared to each template using a series of small rotations to find the template that matches the best. The rotations are made between a min and max angle threshold using the golden ratio. The score rating the match between stroke and template is the average distance between all corresponding points of the stroke and template divided by half the length of the bounding box diagonal (a limit of the path distance, can't be further away from stroke to template than half the bounding box), subtracted from 1 (so higher scores are better). The templates are ranked according to score and returned.
The authors compared 10 entries of 16 gestures using their recognizer and variations of Rubine and a dynamic time warping algorithm. DTW and $1 perform almost indistinguishably, but $1 is much simpler and runs much faster.
Discussion
First my beef with the title: no training or libraries? What do you call the templates, one per possible type of shape you want to recognize? I understand their hyperbole, but it seems a little ridiculous, even for UIST.
It's important to remember the purpose of this recognizer. This is not meant to be the next groundbreaking method that pushes 99.99% accuracy in every possible domain. Even their title belies this admission on their own behalf: "for User Interface Prototypes." This, I believe, is meant to be something that you can just slap together an hour before a customer/investor comes over to get decent recognition results. The great part about this algorithm is that the entire source code is listed in the appendix, taking most of the mythos and shrouded mystery out of sketch recognition (no hidden Markov models or Bayes nets to code).
2 comments:
Is sketch recognition possible without any sort of training or libraries? I still believe that this paper didn't belong in UIST. This is not saying it is a bad paper (I like the algorithm being given, without the dreaded missing thresholds).
I'm slowly being convinced that this paper is not as cool as I thought. My previous arguments were that this paper would be an excellent intro paper for undergrads into SR. Now I think this paper should not be used for the next SR course, for fear that it will confuse the little ones. I do think that the paper's contributions weren't a complete lost with the code added at the end. Maybe it would be better if they had spent another dollar on the recognizer.
Post a Comment