Friday, December 14, 2007

Handwriting in LADDER

FLV video/demo

Presentation

All stuff Copyright 14 Dec. 2007, Joshua Johnston.

Thursday, December 13, 2007

Alvarado and Lazzareschi -- Properties of Diagrams

Alvarado, Christine, and Michael Lazzareschi. "Properties of Real-World Digital Logic Diagrams." PLT 2007.

Summary



Alvarado and Lazzareschi examine unconstrained sketches of digital logic diagrams that were created by students in a real-world setting: the class room. 13 students in a digital logic class were given Tablet PCs, and all notes and lab work, etc., was completed through sketching on the tablet. The authors took the first portion of the class's notes (the portion of the class dealing with low-level diagramming) and extracted 98 individual diagrams. The diagrams were then all labeled by hand to identify symbols. Analysis was performed on the labeled diagrams to determine the impact of stroke order, stroke timing, and number of strokes that students use to draw real sketches. This analysis sheds light on many of the underlying assumptions that sketch recognition algorithms make to boost their recognition rates.

The authors examined if stroke ordering could be used to help delineate different objects in the diagrams by seeing if students would complete a shape completely before moving on to start stroking in different locations. This was shown to be a false assumption almost 20% of the time. That is, of all the symbols in the diagram, 20% of them were drawn with nonconsecutive strokes. While many of these strokes seemed to be overtracing or touch-up, this is still a significant amount.

The authors next looked at the timing of the strokes from one object to another, seeing if different shapes in the diagram could be delineated using a certain threshold on pause time. The authors did find that, on average, users made shorter pauses between strokes in the same shape, and made longer pauses between shapes. This difference is significant using the Wilcoxon rank sum. However, there is a great deal of overlap in the amount of time user's paused. The authors looked at each user, determining the "optimal" amount of pause time to delineate strokes, and computed the recognition error rates that would result if timing information was the only basis for recognition (determining if this is a new symbol). At a minimum, 10% error would occur, at max 35% would occur, and on average 20% would occur. This is a significant amount of error.

Last, the authors looked at the numbers of strokes used to draw each symbol. The authors found that not only did the number of strokes per symbol vary from student to student and from symbol to symbol, but also that variations occurred within the same user drawing the same symbol. Moreover, some users were more consistent, while others showed more variation. However, the authors did conclude that for this domain (circuit logic diagrams), the overwhelming majority of all strokes were only used to draw symbols. This lends some support to the assumption that single strokes are not used to draw more than one symbol. However, the authors believe this is domain specific.

Discussion



I really liked this paper, as well as the other HMU paper about interface design issues. They have both shed a lot of light about what must be done for sketch recognition to be a viable player in the education community. I limit to this particular community because I wonder if some of the same problems would be found in a professional setting.

For instance, the authors talk about students returning to symbols for touch up or to overtrace, possibly while thinking about what to draw next. It seems like a professional would not have these tendencies, that they would be more inclined to "get it right the first time." No to bash on students, but they are, after all, students... and learning these things for the first time. Of course they aren't going to process things as quickly as a professional.

Also, I don't think some of the other data, such as using a single stroke to draw multiple symbols, or the number of strokes used to create symbols, would hold with professionals. First, I think that pros would have a very set and familiar way to draw symbols, so there would be little variation within different instances of their own symbols. Second, I think that with expertise comes a standard way of drawing things even across pros, or at least everyone gets more comfortable and the number of symbols tends to decrease. Also, I think the more experienced and comfortable you are with laying your diagram out "by the seat of your pants" with only a mental image in your head, you would start to do more and more things using single strokes from one shape to the next, only lifting your pen when you had to.

I also think the timings for intra- and inter-shape strokes would be closer together, as a pro does not have as much need to pause from one shape to the other. They know these things and can do them on the fly with little error.

Wais et al. -- Perception of Interface Elements

Wais, Paul, Aaron Wolin, and Christine Alvarado. "Designing a Sketch Recognition Front-End: User Perception of Interface Elements." Eurographics 2007.

Summary



Wais, Wolin, and Alvarado perform a Wizard of Oz study to examine several interface aspects of a sketch-based circuit diagram application. They wanted to examine three aspects of sketch recognition systems:

  1. When to perform recognition and and how to provide feedback about the results

  2. How the user can indicate which parts of the sketch she wants recognized

  3. Measure how recognition errors impact usability



The authors experimented with nine subjects, asking them to draw circuit diagrams with a pen-based input device (subjects had experience with the domain and devices both). Three sets of experiments tested the above aspects. For the first aspect, the system either used a button press, a check-tap gesture, or a delay of 4 seconds as a trigger for starting stroke recognition, with users preferring the button as it gave them reliable control (the gesture was difficult to recognize, 1/6 times worked). They preferred to use the button to recognize large chunks all at once. When recognition occurred, most users preferred the stroke lines to change color based on recognition status (recognized or not) rather than text labels to appear, as the labels cluttered the drawing. However, most users still hovered over the "recognized" portions (no actual recognition took place) to check to see if the label was correct. The system also introduced different types of random errors into the sketch. Errors themselves didn't seem to bother users too much, as long as they are predictable and lend themselves to learning and correction. Random errors did not provide this ability and were frustrating to users. Also, users want separate spaces, or separate colors, to determine what should be recognized in a sketch and what should be left alone.

Discussion



This was a neat paper. It was nice to see an interface paper saying what people actually liked and what works in a sketch recognition application. I'm not an interfaces person, so this was really the first time I had seen this addressed.

Regarding their pause trigger for recognition: it seems like this might be something that can be learned. Say you take the mean amount of time between the user's last pen stroke and when they press the 'recognize' button. This would capture their average amount of time. Of course, I think using a pause is a bad idea in general. Users expressed the desire for control and predictability. Give it to them rather than having a magical timeout that they neither see nor control.

I would have liked to have seen some interface issues concerning ways of correcting errors. Their method is just for the user to erase the problematic stroke and make it again. Obviously they're just using a dummy application for their Wizard of Oz approach, but I think it would be nice to have a drop down for an n-best list, one of the options being "None of the Above, Plan 9." This magical option would return the strokes to their original form and allow the users to group with a lasso or something the things that should be recognized. This seems like a burden to the user, especially when the push is for things that are fully automatic and uber-accurate. Well, maybe that's not realistic, at least not yet. Even if it is uber-accurate, it will still make mistakes. If the user can help it fix those mistakes, it can possibly learn. Or, at least make it easier for the user to clean up your program's boo boo.

Also, I've nearly decided that gestures are horrible. We already have an application with imperfect sketch recognition, and we're throwing in mega-imperfect gestures! I wonder if anyone knows if there is a good gesture toolkit out there, and I'm talking like 99.99%, none of this 1/6 (ONE CORRECT TO FIVE INCORRECT, 16%) laaaaaame product. But this isn't their fault, it's the gesture toolkit's fault.

Wednesday, December 12, 2007

Alvarado -- SketchREAD

Alvarado, Christine, and Randall Davis. "SketchREAD: A Multi-Domain Sketch Recognition Engine."

Summary



Alvarado presents a new engine for sketch recognition that uses dynamic Bayesian networks (DBNs). Incoming strokes are parsed, as they are drawn, into sets of low-level primitives using Sezgin/Stahovich's toolkit. These primitives are fed into DBNs that give probabilities for the strokes forming certain shapes. Templates of predefined DBNs are brought out and connected together to match the primitives and connect the primitives together in a logical fashion. DBNs are interconnected to form networks that yield probabilities for high level shapes and what the authors call domain shapes and domain patterns.

The DBNs are used for bottom-up recognition, where the primitives are pieced together into shapes, etc, and also for top-down recognition. The top-down portion of the program allows the engine to compensate for shapes that have not yet been finished. It also allows the system to revisit primitives and shapes and determine their accuracy. For instance, if a stroke is split into a 5-line polyline, but the domain has no 5-line shapes, the top-down recognizer can prune the 5-line hypothesis and generate others to test, such as a 2-, 3-, and 4-line hypothesis. Hypotheses are generated at each stage of recognition to see which possible interpretation of the strokes/shapes is most likely. The system has a pruning mechanism to prune those hypothesis without enough support in order to keep the state space from exploding, but this sometimes prunes correct hypotheses that just have not had enough time to develop (e.g. user still drawing finishing strokes). The top-down recognizer is expecting to re-add these pruned-but-correct hypotheses back in.

The engine was tested on two sets of sketches, 10 family trees that were relatively clean and simple, and 80 circuit diagrams, which were far more complex. On average, SketchREAD got 77% of the family tree shapes right, and 62% of the circuit shapes right.

One of the reasons SketchREAD performed "poorly" is because it makes several limiting assumptions to help avoid state-space explosion when searching for hypotheses. The first is that all the lines in a polyline are considered to be one shape, instead of trying each line separately. Additionally, the template DBNs to match strokes were often not matched correctly and the engine was not allowed to try all possible matches.

Discussion



This was a pretty neat approach. I like the way low-level errors were corrected later on. This seems like a must-have approach for any system that hopes to be widely successful. Of course, if you make better low-level recognizers, like Paulson's, this doesn't become too much of an issue. But still, nothing is perfect, as Alvarado mentions.

There were several limitations in the system that I wish Alvarado could get around. I understand they're their to limit the size of the hypothesis space, but they ended up making things hard for her to get right. One of them was that all the lines in a polyline were considered to be part of one shape. This killed them in the circuit domain, where a user must just go on and on drawing wires and resistors without ever lifting the pen. But, trying to break it up into n lines and process each of the n lines separately might be too time consuming. However, if a user had drawn all the lines separately, recognition rates would have been higher. Sure the system would have slowed down, but it's going to do that anyway the more you draw. Again, another tradeoff between accuracy and complexity.

Also, in Figure 8, they note the bottom-left ground is too messy for their top-down recognizer to correct. Well, honestly is looks fine to me. I would like a way to incorporate vision based approaches for exactly this kind of error. It might be too messy when you crunch numbers, but it /looks/ like a ground. Perhaps combining a vision-based likelihood could boost it up?

Sezgin -- Temporal Patterns

Sezgin, Tevfik Metin, and Randall Davis. "Sketch Interpretation Using Multiscale Models of Temporal Patterns." IEEE Sketch Based Interaction, Jan/Feb 2007: 28--37.

Summary



Sezgin seeks to model sketches by analyzing the order in which strokes are made. He constructs two types of models, a stroke level model which is trained on how the strokes are drawn in relation to one another. The model looks at the order certain stokes are drawn, with the assumption that objects are drawn in generally the same stroke order each time.

The second level of recognition occurs at the object level. This model combines the stroke-level models in a manner that describes in what order objects tend to occur in relation to one another. For instance, when drawing a resistor, users might usually draw a wire, then the resistor, then another wire.

Both models are represented as dynamic Bayesian networks (DBNs). DBNs capture the temporal dependencies of the strokes and objects, acting like first order Markov models. They are trained on a set of sketches. Each sketch is preprocessed and broken into a set of primitives (lines, arcs, etc.), for each of which an observation vector is created. The observation vectors are fed into the DBNs and the probabilities of state transitions, etc., are trained. For classification, a sketch is broken into primitives/observation vectors in the same way and fed into the DBNs. The representation with the highest likelihood is chosen as the correct classification.

Results were collected on eight users who drew 10 sketches of electrical diagrams each. They got about 80-90% accuracy over the entire sketch. One of the biggest weaknesses in their approach is that it is time-based, so any strokes drawn out of the expected order can throw a wrench in the works.

Discussion



This was a pretty neat approach to sketching, something that we've not seen before. I like the way that temporal patterns are looked at, because I do feel a lot of information can be extracted from stroke/object ordering. However, as shown in the paper, it can also be a hindrance. Strokes that are drawn out of expected order can cause recognition errors. This might be correctable with more training data, where outliers become more common and the system can account for them.

They might also be able to account for out of order strokes by considering adding a bit of ergodic behavior to the models. Right now everything moves left to right, from start to completion of each object, and then on to the next object. Adding ergodic behavior would allow back transitions, self transitions, and possible jumps from the middle of one object to another.

Sezgin mentioned that his PhD work dealt with creating a system that could "suspend" one object's model and start another if strokes occurred out of order. Of course, what happens then if you start a third? Is there a way to generalize this approach so you can have any number of n objects started at once?\

The recognition results weren't great, but this is a groundbreaking paper since nothing like this has been done, really. The recognition rates will get better as the models improve. I would like to see something like this combined with a vision based approach. A system where strokes and objects are identified not just in the order in which they appear, but also based on spatial arrangements and such. This might help eliminate some of the temporal confusion encountered by Sezgin, if something "far away" was not considered, even if it occurred next.

Tuesday, December 11, 2007

Wobbrock et al. -- $1 Recognizer

Jacob Wobbrock, Andrew Wilson, and Yang Li. "Gestures without Libraries, Toolkits, or Training: A $1 Recognizer for User Interface Prototypes." UIST 2007.

Summary



Wobbrock et al. present a very simple single-stroke recognizer that uses a template-based approach, named because it's cheap and easy to create. Their method works in four basic steps.

First, the stroke is resampled. Given a parameter N, the number of points in the resampled stroke, and the stroke length L, $1 interpolates N points (possibly new) along the original stroke path spaced L/N units apart. This ensures a uniform representation for fast/slow strokes to be compared to each other (fast strokes are less dense than slower strokes of the same length).

Second, the resampled stroke is rotated so that comparisons are more or less rotationally invariant. The angle of rotation is computed from the line segment connecting the center of the stroke's bounding box and the first point of the stroke, rotated to the horizontal axis at 0 degrees.

Third, the resampled and rotated points are scaled so that the bounding box is equal to a reference square of a certain size, disregarding original aspect ratio. The points are then translated so that the centroid (mean along x and y) is moved to the origin (0,0). These points are meant to make the strokes more scale invariant.

Fourth, the stroke is compared to each template using a series of small rotations to find the template that matches the best. The rotations are made between a min and max angle threshold using the golden ratio. The score rating the match between stroke and template is the average distance between all corresponding points of the stroke and template divided by half the length of the bounding box diagonal (a limit of the path distance, can't be further away from stroke to template than half the bounding box), subtracted from 1 (so higher scores are better). The templates are ranked according to score and returned.

The authors compared 10 entries of 16 gestures using their recognizer and variations of Rubine and a dynamic time warping algorithm. DTW and $1 perform almost indistinguishably, but $1 is much simpler and runs much faster.

Discussion



First my beef with the title: no training or libraries? What do you call the templates, one per possible type of shape you want to recognize? I understand their hyperbole, but it seems a little ridiculous, even for UIST.

It's important to remember the purpose of this recognizer. This is not meant to be the next groundbreaking method that pushes 99.99% accuracy in every possible domain. Even their title belies this admission on their own behalf: "for User Interface Prototypes." This, I believe, is meant to be something that you can just slap together an hour before a customer/investor comes over to get decent recognition results. The great part about this algorithm is that the entire source code is listed in the appendix, taking most of the mythos and shrouded mystery out of sketch recognition (no hidden Markov models or Bayes nets to code).