Thursday, September 13, 2007

Early Processing for Sketch Understanding

Sezgin, Tevfik Metin, et al. "Sketch BAsed Interfaces: Early Processing for Sketch Understanding." PUI 2001.

Summary



Sezgin et al seek to develop a system that allows users to directly draw sketches without restrictions (number of strokes, direction of strokes), and to have those sketches "understood" (approximation phase) into their component lines and arcs. The system can then combine these into low-level primitives (rectangles, squares, circles, etc.) as appropriate, beautify the drawing (second phase), and perform higher level understanding on the primitives (basic recognition phase).

Their approximation subsystem, which divides strokes into line segments and arcs, works by detecting 'corners', or where two line segments connect. Corners are detected by examining the speed (intuitively, users tend to slow down when they draw meaningful corners) and curvature (a corner is, obviously, a change in direction) of the sketch. Average based filtering is used to detect speed minima (low speeds denote corners), F_s, and curvature maxima (high change in direction denotes corners), F_d. The set of vertices that denote corners in the drawing is compiled from the sets F_d and F_s using an iterative technique that adds the most "certain" vertex from each set (certainty is computed differently for speed and curvature, and tells approximately how 'significant' the values of speed and curvature are at those points), picking between the most certain vertices using least squared error fitting. The iterations continue, adding points by both speed and curvature, until the error of the fit drops below a certain threshold.

Arcs are fit with Bezier curves (consisting of two endpoints and two control points). First, arcs are detected by looking at the distance between consecutive vertices (from corner detection) and computing a ratio with the total arc distance (sum of the distance between all the points between the vertices). Straight segments will have a ratio close to 1, while arced segments will have a ratio much higher than one (since the space between vertices is not a straight line, like the computed distance between vertices). Control points are then calculated based on the tangent vectors to the arc.

One lines and arce have been identified and represented using vertices and Bezier control points, the program can be beautified by not only drawing lines straighter, but also by rotating lines that are supposed to be parallel until they actually are. Beautified sketches can be compared to templates to perform basic object recognition.

They evaluated their system subjectively, using human subjects to say whether they liked the system or not (9/10 said they did). The authors also claim a 96% success rate in determining true vertices.

Their work is an improvement on other work because the vertex recognition is automatic, it allows for more natural sketching (the user does not have to lift the pen, stop drawing, or press buttons to denote vertices), and is completely automatic. In the future, the authors hope to use machine learning techniques (like scale space theory) to reduce the need for hard-coded thresholds. They also want to explore intentional changes in speed by the user as an indication of how precise they wish their sketch to be (requiring less beautification, presumably, for a careful sketch).

Discussion



There are so many details that are omitted, it's crazy! I understand the need to be concise and fit things into a specified number of pages, but these seem like major operating details.


  1. What is the error threshold to stop adding points to the final set of vertices, H_f? Why is this value hard-coded and not some sort of statistical test?

  2. In the average based filtering, why do we hard-code the mean threshold? They do cite this as a need for scale space theory, and they want to be simple, so it's understood.
  3. They talk about fitting an arc to a segment if the ratio of stroke length / euclidean distance is significantly higher than one. Significantly how and how much?
  4. When they rotate line segments to make things parallel/perpendicular, what happens to vertices that no longer touch (line segments don't overlap)? Or do they change connected lines at the same time?
  5. Why not beautify curves? Or do they? They weren't clear if they did or not, I don't think.

No comments: