Thursday, February 19, 2009

Shilman - Statistical Visual Language

@INPROCEEDINGS{shilman2002statisticalVisual,
author = "Michael Shilman and Hanna Pasula and Stuart Russell and Richard Newton",
title = "Statistical visual language models for ink parsing",
booktitle = proc # aaai # "Spring Symposium on Sketch Understanding",
year = "2002",
pages = "126--132",
publisher = "AAAI Press",
abstract = "In this paper we motivate a new technique for automatic recognition of hand-sketched digital ink. By viewing sketched drawings as utterances in a visual language, sketch recognition can be posed as an ambiguous parsing problem. On this premise we have developed an algorithm for ink parsing that uses a statistical model to disambiguate. Under this formulation, writing a new recognizer for a visual language is as simple as writing a declarative grammar for the language, generating a model from the grammar, and training the model on drawing examples. We evaluate the speed and accuracy of this approach for the sample domain of the SILK visual language and report positive initial results."
}


Built on top of SILK (Landay et al) to extends its recognition capabilities. Low-level primitives are recognized with Rubine's algorithm and his features. Higher-level components are constructed from low-level primitives and visual constraints placed on them. Constraints include:
  1. Distance, DeltaX, DeltaY, Overlap - spatial relations
  2. Angle
  3. WidthRatio, HeightRatio - size relations
The constraints use hard-coded threshold ranges. The ranges are expressed as Gaussian distributions, giving p(feature | label). Here, a feature is a constraint, and the digital ink has a training label assigned to it. The priors p(feature) and p(label) are learned from data or derived empirically on a best-guess basis. The likelihood p(feature | label) is learned from training data. Labels are assigned to sketches based on the MAP criteria p(high-level label | features ^ low-level labels).

Rather than trying all possible sets of ink strokes to get the optimal set of features and low-level labels to compute the MAP criterion for, the authors propose a simple ink parsing algorithm. The algorithm takes a stroke at a time and only considers groupings that are relevant to the new symbol. The parse tree is pruned using cutoff values for the constraint posteriors.

The authors play with the threshold value and can ashieve a max of about 80% stroke-level accuracy and 90% stroke-level precision @ 3.

No comments: