Summary
Paulson and Hammond construct a system for recognizing low-level primitives made with single strokes, combining the primitives together in a hierarchical fashion and beautifying the strokes. Previous methods at this type of work used user- and style-dependent features (e.g., Rubine, Long) that required many training examples. Corner-detection methods (e.g., Sezgin, Yu and Cai) are able to find primitives but not construct complex hierarchies out of them for beautified strokes. The authors' improvements are adding two new features, plus a ranking system to choose the best of many possible fits to complex strokes.
The first stage of their system is pre-recognition computations, namely of features, overtrace detection, and stroke closure detection. The first of the two new features that are introduced are normalized distance between direction extremes (NDDE), which measures the ratio of the stroke that lies between direction extrema (arcs are high, polylines with many changes tend to be lower). The second new feature is direction change ratio (DCR), which is the ratio of maximum direction change to average direction change (similar to acceleration, polylines change often and dramatically and are high, arcs are smoother and are low). Testing for overtracing is performed by looking for constant direction for > 2*PI length. Stroke closure is tested by making sure the endpoints are "close enough" together.
The strokes are then fed into a series of primitive recognizers to test for lines, polylines, ellipses, circles, arcs, curves (Bezier of order < 5), spirals, and helixes. Complex shapes are constructed if none of the above tests for primitives pass by splitting the stroke on highest curvature point and recursively feeding the substrokes into the primitive recognizer (similar to Yu). Once a complex fit is obtained, the substrokes are examined to see if any can be recombined. The interpretations of the primitives/complex representation that are deemed to fit the stroke are ranked according to complexity. The ranking procedure tends to choose simpler models (in the number of primitives required to construct it), based on empirically-set costs for certain primitives.
Their system runs very quickly compared to a modified Sezgin recognizer, taking only half the time, on average, to classify a stroke (but both are very fast). Their method also achieved an impressive 98-99% accuracy classifying test shapes. their difficulties arose when the system had to decide between an accurate polyline fit and a simpler model of fewer primitives. Their method consistently outperformed Sezgin's algorithm, and is able to classify many more types of strokes.
Discussion
First, it's nice to see a paper that clearly defines their thresholds and constants. Second, it's nice to see a paper that gets such high accuracy levels on many types of primitives and combinations into complex shapes. I was starting to get discouraged that the problem of sketch recognition was 'too hard.'
The authors mention detecting overtraced circles. I wonder about overtraced lines, say if the user wanted to make there there was a connection between two objects. I assume that currently these would be treated like a polyline that just curves back and forth on itself. I wonder how difficult it would be to detect overtraced lines and construct 'heavy' lines out of them.
Also, the authors have no detection methods for anything rectangular. It would be nice to see 4 polylines that could be created into a square or rectangle. Furthermore, it seems like the application of the hierarchical fitting system could be extended to arbitrary regular polygons. Of course, the difficulty arises in deciding how many "automatic" shapes are too many to compare against. However, if the classification is hierarchical, perhaps these tests would only be performed if the stroke was closed, and then matched to the appropriate n-gon depending on the number of sub-strokes (lines in the polyline).
I thought the ranking system was very unique. I felt this was one of the best points of the paper and by the results in Table 1, it provides more accuracy by itself than just using the new features. I wonder if there is a different way to compute ranking costs other than empirical observation. Perhaps not, since judgments for accuracy are going to be empirical anyway (does the user think it looks right).
1 comment:
In regard to your comment about the system lacking recognition for rectangular shapes in specific, I actually wanted to bring that up in class. I ended up not doing so since I didn't think anyone else shared this view. The only counter-argument I could come up with the system omitting rectangles as a primitive is when a user intended a drawing with 4 polylines to be something interpreted other than a rectangular, like another kind of quadrilateral. But I figured that with rectangles being a popular shape to work with, I don't see too much harm having rectangles and squares as primitives in the same sense ellipses and circles are.
Post a Comment