Thursday, February 19, 2009

Landay - SILK

Interactive Sketching for the Early Stages of User Interface Design, James Landay



@inproceedings{landay1995silk,
author = "James Landay and Brad Myers",
title = "Interactive sketching for the early stages of user interface design",
booktitle = proc # chi,
year = "1995",
isbn = "0-201-84705-1",
pages = "43--50",
location = "Denver, Colorado, United States",
doi = "http://doi.acm.org/10.1145/223904.223910",
publisher = "ACM Press/Addison-Wesley Publishing Co.",
address = "New York, NY, USA",
abstract = "Current interactive user interface construction tools are often more of a hindrance than a benefit during the early stages of user interface design. These tools take too much time to use and force designers to specify more of the design details than they wish at this early stage. Most interface designers, especially those who have a background in graphic design, prefer to sketch early interface ideas on paper or on a whiteboard. We are developing an interactive tool called SILK that allows designers to quickly sketch an interface using an electronic pad and stylus. SILK preserves the important properties of pencil and paper: a rough drawing can be produced very quickly and the medium is very flexible. However, unlike a paper sketch, this electronic sketch is interactive and can easily be modified. In addition, our system allows designers to examine, annotate, and edit a complete history of the design. When the designer is satisfied with this early prototype, SILK can transform the sketch into a complete, operational interface in a specified look-and-feel. This transformation is guided by the designer. By supporting the early phases of the interface design life cycle, our tool should both ease the development of user interface prototypes and reduce the time needed to create a final interface. This paper describes our prototype and provides design ideas for a production-level system.",
}

Proposal of SILK, a prototyping tool for common WIMP GUIs that allows designers to sketch out an interface. The sketch is recognized and parts of the GUI can become interactive to design behavior, etc. Allows for quick, rough iterative design with a "sketchy" look, which is shown to foster creativity in design brainstorming sessions.

Editing is allowed on the sketches using a modal gesture interface. Editing mode entered by pressing the side button on the stylus.

Allows for design history. The user can save multiple versions of the same interface sketch, even viewing more than one at once and copying parts from one onto the other. Additional layers of sketch may be added to a drawing as 'annotation' layers that recognition is not performed on.

Recognition handled via Rubine's algorithm, limiting the drawings to single-stroke primitives. To form high-level shapes, the primitives are put together with three basic constraints (relations) in a heuristic rule-based recognizer. The constraints are:
  1. Contains
  2. Near
  3. Sequence of components

The system is allowed to revise recognition results based on new strokes being drawn, or previous strokes being edited (including deletion). The recognizer can propose alternatives in an n-best list.

Shilman - Statistical Visual Language

@INPROCEEDINGS{shilman2002statisticalVisual,
author = "Michael Shilman and Hanna Pasula and Stuart Russell and Richard Newton",
title = "Statistical visual language models for ink parsing",
booktitle = proc # aaai # "Spring Symposium on Sketch Understanding",
year = "2002",
pages = "126--132",
publisher = "AAAI Press",
abstract = "In this paper we motivate a new technique for automatic recognition of hand-sketched digital ink. By viewing sketched drawings as utterances in a visual language, sketch recognition can be posed as an ambiguous parsing problem. On this premise we have developed an algorithm for ink parsing that uses a statistical model to disambiguate. Under this formulation, writing a new recognizer for a visual language is as simple as writing a declarative grammar for the language, generating a model from the grammar, and training the model on drawing examples. We evaluate the speed and accuracy of this approach for the sample domain of the SILK visual language and report positive initial results."
}


Built on top of SILK (Landay et al) to extends its recognition capabilities. Low-level primitives are recognized with Rubine's algorithm and his features. Higher-level components are constructed from low-level primitives and visual constraints placed on them. Constraints include:
  1. Distance, DeltaX, DeltaY, Overlap - spatial relations
  2. Angle
  3. WidthRatio, HeightRatio - size relations
The constraints use hard-coded threshold ranges. The ranges are expressed as Gaussian distributions, giving p(feature | label). Here, a feature is a constraint, and the digital ink has a training label assigned to it. The priors p(feature) and p(label) are learned from data or derived empirically on a best-guess basis. The likelihood p(feature | label) is learned from training data. Labels are assigned to sketches based on the MAP criteria p(high-level label | features ^ low-level labels).

Rather than trying all possible sets of ink strokes to get the optimal set of features and low-level labels to compute the MAP criterion for, the authors propose a simple ink parsing algorithm. The algorithm takes a stroke at a time and only considers groupings that are relevant to the new symbol. The parse tree is pruned using cutoff values for the constraint posteriors.

The authors play with the threshold value and can ashieve a max of about 80% stroke-level accuracy and 90% stroke-level precision @ 3.

Saturday, May 10, 2008

Eisenstein - Device independence and extensibility

Eisenstein, J.; Ghandeharizadeh, S.; Golubchik, L.; Shahabi, C.; Donghui Yan; Zimmermann, R., "Device independence and extensibility in gesture recognition," Virtual Reality, 2003. Proceedings. IEEE , vol., no., pp. 207-214, 22-26 March 2003

Summary



They're trying to make a recognition framework that is device independent, so you can plug any data into it from any device. The first layer is the raw data, and each sensor value is application dependent. The second layer is a set of predicates that combine raw values into postures. The third layers is a set of temporal predicates that describe changes in posture over time. The fourth layers is a set of gestural templates that assign temporal patterns of postures to specific gestures.

Sensor values from layer 1 are mapped to predicates by hand for each device. Templates are added once, combining predicates. Use a bunch of neural networks to combine predicates into temporal data, and a bunch more neural networks to combine temporal data into gestures. Train all these networks with a whole lot of data.

Discussion



They omit stuff with temporal data from their experiments, so only ASL letters excluding Z and J. This is pretty cheesy. Also, they are shooting for device independence but you still have to map and train the connections between raw sensor values and predicates by hand, for each device. I understand you'd have to do this for any application, but it seems to defeat their purpose. I guess their benefits come at the higher levels, where you use predicates regardless of how they were constructed.

This seems crazy complicated for /very/ little accuracy. Using neural networks to classify the static ASL letters (all but Z and J), they only get 67\% accuracy. Other approaches are able to get close to 95-99% for the same data. I guess things are a little too complex.

Schlomer - Gesture Rec with Wii Controller

Schlomer, Poppinga, Henze, Boll. Gesture Recognition with a Wii Controller (TEI 2008).

Summary



Wiimote. Acceleration data. Quantize the (x,y,z) acceleration data using k-means into a codebook of size 14. Plug the quantized data in to a left-right HMM. Segment gestures by making the user press and hold the A button during a gesture. They can recognize about 90% of all gestures accurately (circle, square, tennis swing).

Discussion



The point of this is.... what? Wii games, like Wii tennis and bowling, are pretty darned accurate at this already.

Murayama - Spidar G&G

Summary



We have a 6 DOF haptics device called the SPIDAR G that provides force feedback. Let's put two of them together, one for each hand. We'll make people do different things, like use one hand to manipulate a target and another to manipulates an object with have to put into the target. If they run into something, we'll give them feedback to say they hit something. When they use one Spidar as opposed to just 2, they can usually do things faster.

Discussion



So all they do is take one machine and hook another up and use the two of them. And using two things, users can tend to do things a little bit faster. This is pretty obvious, since instead of just moving the object, I can now also move the target. It saves the amount of work I have to do.

I wonder if the strings of the Spidar would get in your way and limit your movement. Surely they would. Rotation would also be tough because you can't hold onto something and rotate it more than about 180 degrees.

No real evaluation performed, just a little bit of speedup data.

Lapides - 3D Tractus

Lapides, P., Sharlin, E., Sousa, M. C., and Streit, L. 2006. The 3D Tractus: A Three-Dimensional Drawing Board. In Proceedings of the First IEEE international Workshop on Horizontal interactive Human-Computer Systems (January 05 - 07, 2006). TABLETOP. IEEE Computer Society, Washington, DC, 169-176. DOI= http://dx.doi.org/10.1109/TABLETOP.2006.33

Summary



I want to draw in 3D, but all I have is this tablet PC. I know, I'll put the table PC on a table that moves up and down, simulating the third dimension. I'll make a drawing program with a user interface that shows the 3D drawing from a few angles so users won't get too confused. We'll use visual cues like depth with line width. Also, we'll use perspective projection so the user knows which things are 'below' the current plane of the tablet PC.

People used it and said it was neat.

Discussion



This drawing/3D modeling approach is a little more realistic than other things we've read (Holosketch or the superellipsoid clay thing) since all you need is a tablet and one of their funky elevator things. So I'll give it kudos there.

I'm not sure how simple or intuitive it is, however, to have to move your tablet pc up and down to draw in the third dimension. I question the accuracy, especially if you're trying to line things up one on top of another, etc, since you don't really have a good idea of where things are in the Z axis. This is especially hard if what you're trying to draw exists above the current plane of the tablet PC, since their software only shows you what's below the current plane.

Neat idea, but a bit klunky. Hope someone doesn't get their legs cut off if the tractus goes berserk.

Krahnstoever - Activity Recognition with Vision and RFID

Krahnstoever, N.; Rittscher, J.; Tu, P.; Chean, K.; Tomlinson, T., "Activity Recognition using Visual Tracking and RFID," Application of Computer Vision, 2005. WACV/MOTIONS '05 Volume 1. Seventh IEEE Workshops on , vol.1, no., pp.494-500, 5-7 Jan. 2005

Summary



Person in an office or warehouse with cameras on them. Track their movements with a monte carlo model examining the image frames. Augment this with RFID tags embedded in all the objects the human can interact with. Do activity recognition by examining how the person is moving (vision) and what they are interacting with (rfid). RFID helps augment visual tracking for the purposes of activity recognition.

Discussion



So they take an existing Monte Carlo visual tracking algorithm and magically throw RFID in to the jar. They say this does better. Sort of a "duh" moment. Why do we let Civil Josh pick papers?