Tuesday, September 18, 2007

Yu - Corners and Shape recognition

Yu, Bo, and Shijie Cai. "A Domain-Independents System for Sketch Recognition." 2003.

Summary



The authors present work very built off Sezgin's basic corner detection algorithms. Their system recognizes primitives (line segments, arcs, circles...) and puts them together via basic object recognition. Deficiencies in current systems they wish to address are strict restrictions on drawing style and inability to decompile curves (smooth or hybrid) into constituent primitives. To this end, they wish to allow users to draw naturally (multiple strokes are fine), provide consistent recognition results, group primitives hierarchically, beautify noisy strokes into higher level shapes.

First, the system performs stroke approximation immediately after the stroke is drawn. The points of the stroke, consisting of (x, y, timestamp) tuples, are examined direction and curvature graphs are computed, very similarly to Sezgin's methods (curvature here is slightly different). Vertices are declared to be the peaks in curvature. To approximate a stroke with primitives, the entire stroke is first attempted to be fitted with a primitive. If this fails, the stroke is split into substrokes at the point with highest curvature and approximation recurses on the substrokes until all portions of the stroke are approximated with a primitive.

Line approximation occurs first. A stroke is checked if it is a line by examining the direction graph (should be horizontal--no changes in direction) and the actual points in the stroke (should fit to a straight line). If either of these criterion fit to a best-fit line (using a least-squares fit) with few "deviations", a line is fit between the stroke endpoints. Valid fits are those that minimize the feature area to standard area ratio.

If a stroke is not a line, it is fed to the curve approximation routine. The direction graph for a smooth curve should be linear (constant changes in direction). Their system can fit circles, elipses, and arcs. It can even fit overtraced curves (the length of the direction graph is longer than 2*pi) by splitting the direction graph into lengths of 2*pi and fitting each length. Then, if the set of circles is similar, averaging them. Or, if the set of circles has decreasing radius, turn it into a helix. Like lines, circle and arcs are fit by computing the ratio of feature area to standard area, seeking to minimize this ratio.

Post processing includes removing noisy points (that occur at the beginning or end of a stroke), merging similar strokes, and inferring constraints such as connectivity, parallelism, and perpendicularity. Sets of heuristics can be used to combine primitives (such as four line segments) into higher level objects (such as rectangles).

Experimentally, there system is able to identify primitive shapes and polylines with 98% accuracy, and arcs alone with 94% accuracy (crude arcs are incorrectly recognized as a set of lines and arcs). Higher level shapes combining primitives only recognized at about 70%, with problems encountered with smooth connections (since their method /only/ uses curvature).

Discussion



It seems like their definition of curvature is different from Sezgin, normalizing the change in direction over a neighborhood by the stroke length. I wonder how this compares to Sezgin as far as vertex prediction. I was also surprised they did not use speed data. I think Sezgin made a good point for using both speed and curvature.

This paper is neat in that it uses primitive approximations to drive vertex detection, rather than finding vertices and then trying to fit primitives to the spaces between them. I think speed data could and should have been used in some sort of hybrid method like Sezgin, where instead of Sezgin's "just add until we get below a certain error", we pick the vertex with the highest certainty metric and use this to divide the stroke, trying to fit primitives to the substrokes. Basically, a combination of the two methods.

Again, we have a lot of talk about thresholds and limits without describing what these are, how they were derived, and why they are valid. I found it amusing that they say "here [looking at curvature data] we avoid any empirical threshold or statistical calculation for judging vertices." Well sure, because they push all that hand-waving down into the primitive approximation processes. Which, in the end, still equates to judging vertices. Examples of thresholds not well defined:

  • Line approx - number of points "deviating" from best-fit line
    • What does it mean to deviate?
    • What's the "relatively loose" threshold

  • Cleanup - "very short" line segments for deletion/merging, how short?

1 comment:

Grandmaster Mash said...

The threshold for cleaning up strokes is a big issue. I think the threshold I've seen before was if the shorter stroke was less than 10% of the longer stroke's length. This can remove hooks from endpoints, but if the strokes are relatively long a significant portion of the sketch can be removed. For instance, the narrow rectangle example we saw in one of these papers would cause issues.