Thursday, October 25, 2007

Gross and Do -- the Napkin

Gross, Mark, and Ellen Yi-Luen Do. "Ambiguous Intentions: a Paper-like Interface for Creative Design." ACM UIST 1996.

Summary



Gross and Do seek to implement a system where the designer can sketch plans for a system with a great deal of ambiguity and flexibility. This prevents the initial phase of the design from being too cumbersome and restrictive and saves possible interpretations and constraints for application later in the drawing process. This also relieves the system of some of its responsibilities as it is not forced to make accurate predictions for all drawn objects right away, some can be moved off to a later time when more contextual clues are available.

Their system recognizes drawn primitives (called glyphs, things like boxes, circles, etc., and also lines and points) using a simple templating approach. A stroke is analyzed and compared to other example strokes, being classified to a certain type of stroke if it matches any of the templates with a certain amount of certainty. Matches are ranked according to certain thresholds, with things like context helping to break close ties.

Context is established by the user giving the system examples. For instance, to say that four little squares sitting around a large square means a dining room table, the user would draw the boxes and edit the constraints on the various shapes. However, it might be the case that in certain contexts, those 5 boxes were a table (in a room plan for a house, for instance), or it might mean they are houses situated around a pond (in a landscaping drawing). Thus, the users can define different contexts that the system can identify by looking for symbols specific to one context or another. Contexts are kept in chains from specific to general. So for instance, both the context chains for room plans and landscaping would contain the symbols of 5 boxes, with different meanings in each chain.

Discussion



The definition of the constraints/contextual clues requires user input. It also seems like these constraints might be very rigid. Especially when the user has to draw examples of the constraints and context clues, this seems like a very time consuming process. Coupled with the difficulty of the system generating constraint lists from the drawings (which we saw in LADDER), this seems somewhat inefficient. However, I suppose that's the difficulty of describing constraints with a fixed language. It's hard enough for one person to describe geometrical layouts to another person using any vocabulary of their choosing. Intent is very difficult to capture, especially with a limited grammar.

Their system for matching glyphs seems very brute-force and hacked together, in my opinion. I think it would have been nice to see some results on their recognition rates. One problem that I can see is their system of splitting the stroke's bounding box into a grid. What happens if you need more detail? Your system won't be able to add new shapes that need finer granularity in their paths. Or, if you do add more granularity, permuting the possible paths based on rotation and reflectivity becomes more difficult. Also, templating requires a lot of overhead. You have to save all the templates and make O(n) comparisons, which is not the case for other classification methods (like linear classifiers). I can see the strength in templating (can help avoid user- and domain-dependent features like Rubine and Long) but it feels like their version is hacked together. Maybe because they're architects.

No comments: