Wednesday, December 12, 2007

Alvarado -- SketchREAD

Alvarado, Christine, and Randall Davis. "SketchREAD: A Multi-Domain Sketch Recognition Engine."

Summary



Alvarado presents a new engine for sketch recognition that uses dynamic Bayesian networks (DBNs). Incoming strokes are parsed, as they are drawn, into sets of low-level primitives using Sezgin/Stahovich's toolkit. These primitives are fed into DBNs that give probabilities for the strokes forming certain shapes. Templates of predefined DBNs are brought out and connected together to match the primitives and connect the primitives together in a logical fashion. DBNs are interconnected to form networks that yield probabilities for high level shapes and what the authors call domain shapes and domain patterns.

The DBNs are used for bottom-up recognition, where the primitives are pieced together into shapes, etc, and also for top-down recognition. The top-down portion of the program allows the engine to compensate for shapes that have not yet been finished. It also allows the system to revisit primitives and shapes and determine their accuracy. For instance, if a stroke is split into a 5-line polyline, but the domain has no 5-line shapes, the top-down recognizer can prune the 5-line hypothesis and generate others to test, such as a 2-, 3-, and 4-line hypothesis. Hypotheses are generated at each stage of recognition to see which possible interpretation of the strokes/shapes is most likely. The system has a pruning mechanism to prune those hypothesis without enough support in order to keep the state space from exploding, but this sometimes prunes correct hypotheses that just have not had enough time to develop (e.g. user still drawing finishing strokes). The top-down recognizer is expecting to re-add these pruned-but-correct hypotheses back in.

The engine was tested on two sets of sketches, 10 family trees that were relatively clean and simple, and 80 circuit diagrams, which were far more complex. On average, SketchREAD got 77% of the family tree shapes right, and 62% of the circuit shapes right.

One of the reasons SketchREAD performed "poorly" is because it makes several limiting assumptions to help avoid state-space explosion when searching for hypotheses. The first is that all the lines in a polyline are considered to be one shape, instead of trying each line separately. Additionally, the template DBNs to match strokes were often not matched correctly and the engine was not allowed to try all possible matches.

Discussion



This was a pretty neat approach. I like the way low-level errors were corrected later on. This seems like a must-have approach for any system that hopes to be widely successful. Of course, if you make better low-level recognizers, like Paulson's, this doesn't become too much of an issue. But still, nothing is perfect, as Alvarado mentions.

There were several limitations in the system that I wish Alvarado could get around. I understand they're their to limit the size of the hypothesis space, but they ended up making things hard for her to get right. One of them was that all the lines in a polyline were considered to be part of one shape. This killed them in the circuit domain, where a user must just go on and on drawing wires and resistors without ever lifting the pen. But, trying to break it up into n lines and process each of the n lines separately might be too time consuming. However, if a user had drawn all the lines separately, recognition rates would have been higher. Sure the system would have slowed down, but it's going to do that anyway the more you draw. Again, another tradeoff between accuracy and complexity.

Also, in Figure 8, they note the bottom-left ground is too messy for their top-down recognizer to correct. Well, honestly is looks fine to me. I would like a way to incorporate vision based approaches for exactly this kind of error. It might be too messy when you crunch numbers, but it /looks/ like a ground. Perhaps combining a vision-based likelihood could boost it up?

2 comments:

Paul Taele said...

I've been staring at Figure 8 for five minutes, and my eyes hurt. :D I guess because of the work on my final project that I am more understanding of that ground not being recognized. What surprised me more was how the bottom right ground was recognized. Maybe this stems from my lack of remembering how the domain performed (is that particular ground missing a line?). The misrecognition does seem like something a Sezgin recognizer in the LADDER debugger would have done as well. Doesn't SketchREAD use LADDER in the background? I'm only going off my memory of the paper's citations.

print said...

how to prune the line hypothesis?