Oltmans -- ASSISTANCE

Oltmans, Michael, and Randall Davis. "Naturally Conveyed Explanations of Device Behavior." PUI 2001.

Summary

Oltmans created a system called ASSISTANCE that can take a recognized sketch (recognized by the ASSIST program) and speech input (vie IBM's ViaVoice program) and construct causal links for bodies in a mechanical engineering sketch. The example given in the paper is one of a Rube Goldberg machine that involves several bodies, springs, a pulley, and levers. The system is given force arrows that indicate direction and spoken relationships that give more cause/effect information.

The point of the system is that most current design systems, like CAD, require the user to put far too much thought and detail into early phases of the design. For instance, when trying to create a Rube Goldberg machine to crack an egg (the example above) that involves a spring, you don't want to have to specify things like the spring coefficient, or the elasticity of the various bodies, etc. Instead, you just want to get a rough idea to see if things work. Using ASSISTANCE, you first sketch a diagram, which is interpreted in an off-line manner by another program, ASSIST. You then annotate the diagram with various multi-modal explanations of the behavior (not specific parameters) of the system (using drawings--arrows--, speech, and pointing).

Given the explanation annotations, ASSISTANCE adds them to a rule-based system. The system can draw consistent conclusions from the set of behaviors, yielding a series of causal structures. These structures describe how things happen in the system (body 1 hits body 2 and causes it to move, etc.). Searching for various causal links can be time consuming, but the authors find it to be very quick most of the time.

No experimental results were given as to the system's efficacy besides their own experiences.

Discussion

Speaking seems like a nice way to get more information to the design program. This is the first paper that I've seen, not necessarily that was written, where I've seen sketching and speaking be combined in a multi-modal interface. It seems like some of the issues that plague both fields (context, intention, etc.) could be resolved, with one mode of input helping to clarify the other, etc. Generally, the more information we have (a user speaking about the sketch rather than just the static sketch), the better equipped we are to do a good job. However, the obvious problem is that both speech and sketch recognition are Hard Problems (TM) and can be very daunting on their own, let alone when combined. Luckily for Michael he used existing software.

The authors say that their system tends to not require exponential time to search the rule based system for a consistent set of causal links. However, the worst case running time is indeed exponential. It seems like they're just getting lucky because their domain is very small and limited. How much would adding a richer vocabulary and more domains increase th complexity of the system? Obviously exponentially. Would additions mean that the truth management system more often encountered exponential running times? Rule-based / knowledge-based deductive systems are neat, but they are extremely complex and in general are working to solve problems that are NP-hard, at least. Expecting to get good performance out of them, using anything other than an approximation, is foolhardy.

I wish there were results, at least a usability study where people rated their system. But alas, there were not. I really don't want to hear the author's opinions about their own system since the want to get published and aren't going to say anything like "Our system didn't work at all and was completely counter-intuitive." Not to say that it is, or that's how I'd feel if I tried it out. It's a hyperbole.

jbjohns - Haptics and Sketch Recognition

Thursday, October 25, 2007