Tuesday, November 6, 2007

Sharon and van de Panne -- Constellation Models

Sharon, D., and M. van de Panne. "Constellation Models for Sketch Recognition." EUROGRAPHICS Workshop on Sketch-Based Interfaces and Modeling, 2006.

Summary



Sharon and van de Panne's approach to sketch recognition uses a set of shapes drawn with single strokes. These shapes are assigned a label based on maximum likelihood fits computed from spherical Gaussian distributions calculated using training examples. The label that results in the highest likelihood (given the Gaussian parameters for that label) is assigned to that shape. Pairs of shapes are compared against each other, and Gaussians are trained for all pairings (pruned down to be pairings between mandatory shapes, as opposed to shapes in a sketch deemed optional). Labels are chosen that also maintain a high likelihood between pairings of shapes.

Features of strokes include the center of the bounding box (x,y coordinates), the length of the diagonal of the bounding box, and the cos of the angle of said diagonal. Features between pairs of strokes include the differences between the center points (delta x, delta y) and the distance between the endpoint of one stroke and the closest point in the other stroke (and another feature the other way around). These feature vectors are computed and Gaussians are trained (four-dimensional spherical) with mean and diagonal covariance matrices calculated using training data. The models of the Gaussians are used to assign likelihoods to testing examples' feature vectors.

Label assignments are searched for using a branch-and-bound method. To avoid a simple exponential (number of labels^number of strokes) brute-force search, different bounding heuristics are used to reduce the search space. One method is to assign all possible mandatory labels and trim branches with smaller likelihoods. This is still bad because we're still exponential in the number of mandatory labels^# strokes. Another approach is to use a multi-pass thresholding algorithm. We start at some high threshold and cut off any developing branch with a likelihood less than the thresh. If no labeling is found, we iterate with increasingly low thresholds until a full labeling is assigned. The third approach is to set a minimum likelihood for any given edge node in the search tree. Any assignment of a label with a likelihood below this threshold would terminate that branch immediately. This is the type of multi-pass thresholding implemented (as opposed to decreasing thresholds of labelings-so-far). The authors also discuss using hard constraints, such as forcing mouth-parts to appear below nose-parts.

The author's don't give any recognition results, only amount of speed-up given by the multi-pass algorithm. I guess it doesn't do so hot.

Discussion




  • The features they use don't tell anything about the contents of the bounding box. It seems like using something like Paulson's MARQS recognizer would give better context-free recognition that's not dependent solely on the size/aspect of the stroke. This would probably help recognition and decrease the amount of time spent labeling the parts. Especially if employed in a bottom-up fashion.
  • On computing the likelihood of the model's likelihood:

    1. They can use the log-likelihood and change the products to sums, to save computation time. Maximizing log(x) also maximizes x.
    2. They assume independence between the probabilities (otherwise you can't multiply probabilities together). This is usually a safe assumption and greatly eases computation time. But, for instance, does assigning the label "nose" to this shape have any affect on the label of "mouth" for this other shape? Possibly, but learning joint probabilities requires a great deal of training examples. I advocate the independence assumption, but just wanted to point this out since they don't state it explicitly.
    3. They assume uniform priors. I don't agree with this assumption at all. They say it's because of a lack of training data. Well, you're training Gaussians with a lack of training data, so you might as well estimate priors as well. This would be helpful for things like optional shapes, where it might not be the case that a eyebrow appears. So if we have something that we think might be an eyebrow (especially because their individual shape recognition only uses bounding box information) we might want to weight that with the prior.

  • They have 20-60 training examples per shape. That's a lot! Plus they don't give recognition results, so we can only assume their method did horribly. I guess having that much information didn't help. Lack of training data is always an issue, so don't complain about it. If you can't get a good method for training priors, or hard constraints, using the limited data you have, you're doing it wrong.
  • Different classifiers work better for different amounts of training data. They don't have enough to fully describe a set of multivariate Gaussians and priors to use in a Baye's Classifier (ML or MAP). I think something like k-nearest neighbors would work well, using the feature vectors as points in some high-dimensional space.

1 comment:

Grandmaster Mash said...

I worked on a project that used log-likelihoods, and it definitely makes their problem they were having with tractability seem odd. Yet, I do like their bounding, since our project tried to use a similar graph method to find circuit diagram sketches that could contain hundreds of strokes. For the actual recognition/classification it was relatively fast (30 seconds at most), but for training it could take over a day for a 3 label case.

And yes, a lack of results is disturbing.