Tuesday, October 16, 2007

Herot - Graphical Input Throuch Machine Recognition of Sketches

Herot, Christopher. "Graphical Input Throuch Machine Recognition of Sketches." SIGGRAPH 1976 : 97-102

Summary



Herot's paper, from the 1970s, describes a system that seeks to find a way of recognizing sketching apart from the semantics of context and domain (i.e. recognizing low-level primitives and combining them hierarchically), a way of using domain context to construct higher-order shapes, and finally a way to allow the user direct involvement in the recognition process to tune the system's capabilities to that user's style.

Herot's ancient, by modern standards, system was a microcomputer, several types of tables for sketching input, and even a storage tube! Input into the system was via the passive tablet, over which a large piece of paper was taped and drawn on using a modified graphite pencil. The pencil's eraser allowed for drawing a "negative line" to remove points from the system. The pen's position was sampled at a constant rate determined by a user-programmable variable. The points were used by the HUNCH system to perform corner detection, using speed and curvature data. HUNCH fit both lines and a separate routine, CURVIT, would fit B-splines to strokes. Herot found, however, that setting the thresholds for recognizing lines/curves was difficult and tended to be user-dependent.

Herot's system also performed endpoint combination, called latching. In the first iteration, this used a fixed radius and joined any points within that radius. This gave error-prone results when the drawing was at a smaller scale than the radius. He tried to take into consideration the speed of the strokes to measure the amount of user intention in the endpoints, but still had problems especially in incorrectly latching 3D figures drawn on the 2D tablet. He handles overtracing by trying to turn several lines into one.

The second part of their system is to incorporate context of the architectural domain to give more understanding to the sketch system. Basically, he seeks to build a bottom-up hierarchical recognizer that uses context to put things together. He also looks at top-down approaches, finding a "room" by looking for its "walls". He notes the need for some sort of ranking system, or something similar, in order to allay the affects of "erroneous or premature matches."

Finally, Herot constructs an interactive system where the user can help guide the recognition decisions and modify the system's operating parameters through feedback so it adjusts to the user's personal drawing style.

Discussion



Wow! Curvature and speed data used for corner detection in the 1970s. So I guess Sezgin's work wasn't all that groundbreaking then? Sure, he did it on modern computers and did a much better job of it, but is that an artifact of improved technologies or his mighty powers as a researcher? Seeing this reference to curvature data almost 15-20 years before Sezgin's paper, I'm tending to lean toward simple advances in computing power and the level of knowledge in the field. Not to dismiss Sezgin's mightiness, on any account. He did things I could never do. But even he stands on the shoulders of his predecessors.

A nugget of brilliance:
A human observer resolves these ambiguities through the application of personal knowledge and years of learning how to look at pictures.... Before a machine can effectively communicate with a human user about a sketch [sic] it must possess a similar body of knowledge and experience.

Exactly! People keep crying out and gnashing their teeth for systems that capture human intention. They want systems that can perform better than humans! For simple systems and domains this is not a problem. In the future, I'm sure we can get into harder domains as the horsepower of the computers and intelligence of the algorithms increases. But humans grow up and learn these things over 20+ years with constant reinforcement-learning, and they automatically incorporate things like context and prior-knowledge that a computer can never know (that's a lot of knowledge to program and sift through). That's not something you can simply program and execute as a method call.

The construction of the hierarchical system was interesting to read over, seeing as how everything he called for now exists in come form or fashion. However, Herot was very pessimistic and ultimately incorrect that hierarchical learning would require context (primitives can be learned automatically--Paulson's recognizer), context is required "at the lowest levels," or that successful approaches would required knowledge-based systems (i.e. artificial intelligence, not the case anymore).

Lastly, it would appear Herot's tablet is passive since it just uses a pen, probably some sort of pressure sensitivity on a narrow tip. Does this prevent me from resting my arms/hand on it while I draw?

1 comment:

Anonymous said...

I'm glad you found this paper useful after all these years.

Back in the 1970's we were still naïve enough to believe that generalized artificial intelligence was only a few decades or two in the future. While AI continues to be in the future, I am glad to see that more pragmatic approaches have resulted in concrete benefits.

To answer your question about the input hardware, the pen transmitted an RF signal which was picked up by a grid of wires in the tablet, giving the XY coordinates. The pressure on the stylus was mechanically transmitted from the pen point to a load cell in the body of the pen. Thus, the operator could rest a hand on the tablet without affecting the Z axis. The drawing software varied the thickness of the displayed line as a function of the pressure, providing accurate feedback to the user and thus keeping the pressure within the operating range of the load cell.

A later version of the system used a finger pointing device and strain gauges to measure the pressure on the display surface itself. Since the display surface was vertical, resting the hand was not an issue.