Wednesday, December 12, 2007

Sezgin -- Temporal Patterns

Sezgin, Tevfik Metin, and Randall Davis. "Sketch Interpretation Using Multiscale Models of Temporal Patterns." IEEE Sketch Based Interaction, Jan/Feb 2007: 28--37.

Summary



Sezgin seeks to model sketches by analyzing the order in which strokes are made. He constructs two types of models, a stroke level model which is trained on how the strokes are drawn in relation to one another. The model looks at the order certain stokes are drawn, with the assumption that objects are drawn in generally the same stroke order each time.

The second level of recognition occurs at the object level. This model combines the stroke-level models in a manner that describes in what order objects tend to occur in relation to one another. For instance, when drawing a resistor, users might usually draw a wire, then the resistor, then another wire.

Both models are represented as dynamic Bayesian networks (DBNs). DBNs capture the temporal dependencies of the strokes and objects, acting like first order Markov models. They are trained on a set of sketches. Each sketch is preprocessed and broken into a set of primitives (lines, arcs, etc.), for each of which an observation vector is created. The observation vectors are fed into the DBNs and the probabilities of state transitions, etc., are trained. For classification, a sketch is broken into primitives/observation vectors in the same way and fed into the DBNs. The representation with the highest likelihood is chosen as the correct classification.

Results were collected on eight users who drew 10 sketches of electrical diagrams each. They got about 80-90% accuracy over the entire sketch. One of the biggest weaknesses in their approach is that it is time-based, so any strokes drawn out of the expected order can throw a wrench in the works.

Discussion



This was a pretty neat approach to sketching, something that we've not seen before. I like the way that temporal patterns are looked at, because I do feel a lot of information can be extracted from stroke/object ordering. However, as shown in the paper, it can also be a hindrance. Strokes that are drawn out of expected order can cause recognition errors. This might be correctable with more training data, where outliers become more common and the system can account for them.

They might also be able to account for out of order strokes by considering adding a bit of ergodic behavior to the models. Right now everything moves left to right, from start to completion of each object, and then on to the next object. Adding ergodic behavior would allow back transitions, self transitions, and possible jumps from the middle of one object to another.

Sezgin mentioned that his PhD work dealt with creating a system that could "suspend" one object's model and start another if strokes occurred out of order. Of course, what happens then if you start a third? Is there a way to generalize this approach so you can have any number of n objects started at once?\

The recognition results weren't great, but this is a groundbreaking paper since nothing like this has been done, really. The recognition rates will get better as the models improve. I would like to see something like this combined with a vision based approach. A system where strokes and objects are identified not just in the order in which they appear, but also based on spatial arrangements and such. This might help eliminate some of the temporal confusion encountered by Sezgin, if something "far away" was not considered, even if it occurred next.

No comments: