Thursday, October 11, 2007

Veselova - Shape Descriptions

Veselova, Olya and Randall Davis. "Perceptually Based Learning of Shape Descriptions for Sketch Recognition." AAAI 2004.

Summary



Veselova and Davis want to build a language for describing sketches arranged into iconic shapes and a system that uses said language to recognize instances of the shapes. Some of the difficulty with this problem lies in the subjective nature of perception, and what different people mean by "different"/"same" or features that are "significant" in making these judgment calls. The psychology of perception is studied and the authors propose a system of extracting the significant features and weighting them in order to make the decision on whether or not two shapes should be regarded as being the same.

Based on perceptual studies, the authors group the description vocabulary that is used to describe constraints between primitives in the shapes into two categories. The first group is the singularities. These are the properties of a shape where "small variations in them make a qualitative difference" in the perception of a shape. Examples of singularities are the notions of vertical, horizontal, parallel, straight, etc. The vocabulary was constructed and each constraint was examined on a set of shapes to determine subjectively which constraints made the most difference in determining similarity. Accordingly, each constraint is assigned a rank.

The importance of each constraint is not only defined by the rank with its peers, but is also adjusted using heuristics that take into account obstruction, grouping, and tension lines. Obstruction is a measure of how many primitives lie between objects 1 and 2. If the obstruction is high, the user will not be as likely to constrain a and b with each other. If, however, the obstruction is low, users will be able to easily constrain a and b with each other. Tension lines define how end- and midpoints of lines are aligned with each other. The human brain tends to like things well aligned, so if things in the shape are aligned, the constraints are boosted. Shapes that are grouped are also treated as wholes rather than individuals, so anything appearing in a group with each other has the constraints boosted, and things in different groups are not constrained as much.

To evaluate their work, the authors had 33 users look at 20 variations ("near-miss" examples similar to Hammond and Davis' work) if 9 shapes. The variations were chosen so that half still met the constraints and half did not. The users said "yes" or "no" to each variation if they felt it did/did not still count as an example of that shape. Their system was able to, where 90% of the users agreed between themselves on the right answer, get 95% accuracy (the same as any user selected at random) with the majority. For shapes where 80% of the people agreed, the system got 83% and any random user would get 91%. Thus, the system was able to, for the most part, give an answer for shape similarity that matched the majority of the human answers.

Future work for this project is to get a shape recognizer built that can use this system, be able to use more than just lines and arcs, build in a way of expressing extremes and "must not" constraints, and handle more than just pairwise constraints. Also, they want the system to be able to handle objects that have an arbitrary number of primitives (like a dashed line--how many dashes?).

Discussion



It sounds like the authors used their own opinions to determine the ranking values for the constraints listed in the table in Section 3.2. I wish they would have performed a user study. I can see where, on the one hand, they would be better suited to assigning these ranking values because they can understand the details of the problem. They have the expert knowledge of knowing what to look for and what might make the most difference. However, that same knowledge is what makes their opinion less credible. If you build a real-world system, it's not going to be used by experts 24/7. A bunch of naive and ignorant (to the domain and problem of sketch recognition, not in general) users are going to be making their own decisions as to what constitutes similarity and what things are important in making those judgment calls. In short, I would have liked to have seen more user input in the vocabulary ranking stage.

I wonder how the scalar coefficients were selected for the penalties on obstruction, tension lines, and grouping. How much do these values affect the final outcome of the prediction? I would like to see a graph of results vs. values for these variables. It just seems like these were sort of picked as a best-guess measure. Surely they weren't, but the paper gives me no other reason to believe otherwise. In their experiments section, the authors state that "people seemed to pay less attention to individual detail (aspect ratio, precise position, etc.) of the composing shapes in the symbol than the system biases accounted for." I wonder, then, if these biases could be adjusted by experimenting with varying the constraint ranking system and the coefficients listed above.

Also, they said they don't have a shape recognition engine built yet. So I am confused as to how they performed the experiments. Is it that they don't actually recognize any shapes, but they generate the 20 examples by hand and manually violate constraints? If so, this seems like a Bad Thing (TM) that is very prone to error and more subjectivity. Maybe they can borrow Hammond's near-miss generator.

But, after saying all that, I liked this paper. The authors seem to have found a way to say which constraints are more important than others. And, to their credit, they seem to get decent results agreeing with the majority of users. However, without having a sketch recognition engine to do these things automatically, I wonder how biased their results are.

No comments: