<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-3062307596356887437</id><updated>2011-07-28T07:34:15.509-05:00</updated><title type='text'>jbjohns - Haptics and Sketch Recognition</title><subtitle type='html'>Clogging the Tubes, one Blog entry at a time.

Now covers two courses/fields/disciplines, so it's more of a gesture recognition catch-22.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>79</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-2508977699682760587</id><published>2009-02-19T12:37:00.003-06:00</published><updated>2009-02-19T13:48:01.976-06:00</updated><title type='text'>Landay - SILK</title><content type='html'>&lt;h1&gt;Interactive Sketching for the Early Stages of User Interface Design, James Landay&lt;/h1&gt;&lt;br /&gt;&lt;br /&gt;@inproceedings{landay1995silk,&lt;br /&gt;    author = "James Landay and Brad Myers",&lt;br /&gt;    title = "Interactive sketching for the early stages of user interface design",&lt;br /&gt;    booktitle = proc # chi,&lt;br /&gt;    year = "1995",&lt;br /&gt;    isbn = "0-201-84705-1",&lt;br /&gt;    pages = "43--50",&lt;br /&gt;    location = "Denver, Colorado, United States",&lt;br /&gt;    doi = "http://doi.acm.org/10.1145/223904.223910",&lt;br /&gt;    publisher = "ACM Press/Addison-Wesley Publishing Co.",&lt;br /&gt;    address = "New York, NY, USA",&lt;br /&gt;    abstract = "Current interactive user interface construction tools are often more of a hindrance than a benefit during the early stages of user interface design. These tools take too much time to use and force designers to specify more of the design details than they wish at this early stage. Most interface designers, especially those who have a background in graphic design, prefer to sketch early interface ideas on paper or on a whiteboard. We are developing an interactive tool called SILK that allows designers to quickly sketch an interface using an electronic pad and stylus. SILK preserves the important properties of pencil and paper: a rough drawing can be produced very quickly and the medium is very flexible. However, unlike a paper sketch, this electronic sketch is interactive and can easily be modified. In addition, our system allows designers to examine, annotate, and edit a complete history of the design. When the designer is satisfied with this early prototype, SILK can transform the sketch into a complete, operational interface in a specified look-and-feel. This transformation is guided by the designer. By supporting the early phases of the interface design life cycle, our tool should both ease the development of user interface prototypes and reduce the time needed to create a final interface. This paper describes our prototype and provides design ideas for a production-level system.",&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;Proposal of SILK, a prototyping tool for common WIMP GUIs that allows designers to sketch out an interface. The sketch is recognized and parts of the GUI can become interactive to design behavior, etc. Allows for quick, rough iterative design with a "sketchy" look, which is shown to foster creativity in design brainstorming sessions.&lt;br /&gt;&lt;br /&gt;Editing is allowed on the sketches using a modal gesture interface. Editing mode entered by pressing the side button on the stylus.&lt;br /&gt;&lt;br /&gt;Allows for design history. The user can save multiple versions of the same interface sketch, even viewing more than one at once and copying parts from one onto the other. Additional layers of sketch may be added to a drawing as 'annotation' layers that recognition is not performed on.&lt;br /&gt;&lt;br /&gt;Recognition handled via Rubine's algorithm, limiting the drawings to single-stroke primitives. To form high-level shapes, the primitives are put together with three basic constraints (relations) in a heuristic rule-based recognizer. The constraints are:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Contains&lt;/li&gt;&lt;li&gt;Near&lt;/li&gt;&lt;li&gt;Sequence of components&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;The system is allowed to revise recognition results based on new strokes being drawn, or previous strokes being edited (including deletion). The recognizer can propose alternatives in an n-best list.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-2508977699682760587?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/2508977699682760587/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=2508977699682760587&amp;isPopup=true' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/2508977699682760587'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/2508977699682760587'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2009/02/landay-silk.html' title='Landay - SILK'/><author><name>Joshua</name><uri>http://www.blogger.com/profile/09242972372915675569</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-114903408727706031</id><published>2009-02-19T11:02:00.003-06:00</published><updated>2009-02-19T13:50:51.577-06:00</updated><title type='text'>Shilman - Statistical Visual Language</title><content type='html'>@INPROCEEDINGS{shilman2002statisticalVisual,&lt;br /&gt;    author = "Michael Shilman and Hanna Pasula and Stuart Russell and Richard Newton",&lt;br /&gt;    title = "Statistical visual language models for ink parsing",&lt;br /&gt;    booktitle = proc # aaai # "Spring Symposium on Sketch Understanding",&lt;br /&gt;    year = "2002",&lt;br /&gt;    pages = "126--132",&lt;br /&gt;    publisher = "AAAI Press",&lt;br /&gt;    abstract = "In this paper we motivate a new technique for automatic recognition of hand-sketched digital ink. By viewing sketched drawings as utterances in a visual language, sketch recognition can be posed as an ambiguous parsing problem. On this premise we have developed an algorithm for ink parsing that uses a statistical model to disambiguate. Under this formulation, writing a new recognizer for a visual language is as simple as writing a declarative grammar for the language, generating a model from the grammar, and training the model on drawing examples. We evaluate the speed and accuracy of this approach for the sample domain of the SILK visual language and report positive initial results."&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Built on top of SILK (Landay et al) to extends its recognition capabilities. Low-level primitives are recognized with Rubine's algorithm and his features. Higher-level components are constructed from low-level primitives and visual constraints placed on them. Constraints include:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Distance, DeltaX, DeltaY, Overlap - spatial relations&lt;/li&gt;&lt;li&gt;Angle&lt;/li&gt;&lt;li&gt;WidthRatio, HeightRatio - size relations&lt;/li&gt;&lt;/ol&gt;The constraints use hard-coded threshold ranges. The ranges are expressed as Gaussian distributions, giving p(feature | label). Here, a feature is a constraint, and the digital ink has a training label assigned to it. The priors p(feature) and p(label) are learned from data or derived empirically on a best-guess basis. The likelihood p(feature | label) is learned from training data. Labels are assigned to sketches based on the MAP criteria p(high-level label | features ^ low-level labels).&lt;br /&gt;&lt;br /&gt;Rather than trying all possible sets of ink strokes to get the optimal set of features and low-level labels to compute the MAP criterion for, the authors propose a simple ink parsing algorithm. The algorithm takes a stroke at a time and only considers groupings that are relevant to the new symbol. The parse tree is pruned using cutoff values for the constraint posteriors.&lt;br /&gt;&lt;br /&gt;The authors play with the threshold value and can ashieve a max of about 80% stroke-level accuracy and 90% stroke-level precision @ 3.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-114903408727706031?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/114903408727706031/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=114903408727706031&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/114903408727706031'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/114903408727706031'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2009/02/shilman-statistical-visual-language.html' title='Shilman - Statistical Visual Language'/><author><name>Joshua</name><uri>http://www.blogger.com/profile/09242972372915675569</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-419587453238059727</id><published>2008-05-10T13:06:00.002-05:00</published><updated>2008-05-10T13:25:25.127-05:00</updated><title type='text'>Eisenstein - Device independence and extensibility</title><content type='html'>Eisenstein, J.; Ghandeharizadeh, S.; Golubchik, L.; Shahabi, C.; Donghui Yan; Zimmermann, R., "Device independence and extensibility in gesture recognition," Virtual Reality, 2003. Proceedings. IEEE , vol., no., pp. 207-214, 22-26 March 2003&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;They're trying to make a recognition framework that is device independent, so you can plug any data into it from any device. The first layer is the raw data, and each sensor value is application dependent. The second layer is a set of predicates that combine raw values into postures. The third layers is a set of temporal predicates that describe changes in posture over time. The fourth layers is a set of gestural templates that assign temporal patterns of postures to specific gestures. &lt;br /&gt;&lt;br /&gt;Sensor values from layer 1 are mapped to predicates by hand for each device. Templates are added once, combining predicates. Use a bunch of neural networks to combine predicates into temporal data, and a bunch more neural networks to combine temporal data into gestures. Train all these networks with a whole lot of data.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;They omit stuff with temporal data from their experiments, so only ASL letters excluding Z and J. This is pretty cheesy. Also, they are shooting for device independence but you still have to map and train the connections between raw sensor values and predicates by hand, for each device. I understand you'd have to do this for any application, but it seems to defeat their purpose. I guess their benefits come at the higher levels, where you use predicates regardless of how they were constructed.&lt;br /&gt;&lt;br /&gt;This seems crazy complicated for /very/ little accuracy. Using neural networks to classify the static ASL letters (all but Z and J), they only get 67\% accuracy. Other approaches are able to get close to 95-99% for the same data.  I guess things are a little too complex.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-419587453238059727?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/419587453238059727/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=419587453238059727&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/419587453238059727'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/419587453238059727'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/05/eisenstein-device-independence-and.html' title='Eisenstein - Device independence and extensibility'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-5039958509145575608</id><published>2008-05-10T12:56:00.003-05:00</published><updated>2008-05-10T13:05:46.524-05:00</updated><title type='text'>Schlomer - Gesture Rec with Wii Controller</title><content type='html'>Schlomer, Poppinga, Henze, Boll. Gesture Recognition with a Wii Controller (TEI 2008).&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Wiimote. Acceleration data. Quantize the (x,y,z) acceleration data using k-means into a  codebook of size 14. Plug the quantized data in to a left-right HMM. Segment gestures by making the user press and hold the A button during a gesture. They can recognize about 90% of all gestures accurately (circle, square, tennis swing).&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;The point of this is.... what? Wii games, like Wii tennis and bowling, are pretty darned accurate at this already.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-5039958509145575608?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/5039958509145575608/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=5039958509145575608&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/5039958509145575608'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/5039958509145575608'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/05/schlomer-gesture-rec-with-wii.html' title='Schlomer - Gesture Rec with Wii Controller'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-4424171109657601218</id><published>2008-05-10T12:41:00.002-05:00</published><updated>2008-05-10T12:55:46.544-05:00</updated><title type='text'>Murayama - Spidar G&amp;G</title><content type='html'>&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;We have a 6 DOF haptics device called the SPIDAR G that provides force feedback. Let's put two of them together, one for each hand. We'll make people do different things, like use one hand to manipulate a target and another to manipulates an object with have to put into the target. If they run into something, we'll give them feedback to say they hit something. When they use one Spidar as opposed to just 2, they can usually do things faster.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;So all they do is take one machine and hook another up and use the two of them. And using two things, users can tend to do things a little bit faster. This is pretty obvious, since instead of just moving the object, I can now also move the target. It saves the amount of work I have to do.&lt;br /&gt;&lt;br /&gt;I wonder if the strings of the Spidar would get in your way and limit your movement. Surely they would. Rotation would also be tough because you can't hold onto something and rotate it more than about 180 degrees.&lt;br /&gt;&lt;br /&gt;No real evaluation performed, just a little bit of speedup data.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-4424171109657601218?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/4424171109657601218/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=4424171109657601218&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/4424171109657601218'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/4424171109657601218'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/05/murayama-spidar-g.html' title='Murayama - Spidar G&amp;G'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-4731659776015518561</id><published>2008-05-10T12:31:00.002-05:00</published><updated>2008-05-10T12:41:01.624-05:00</updated><title type='text'>Lapides - 3D Tractus</title><content type='html'>Lapides, P., Sharlin, E., Sousa, M. C., and Streit, L. 2006. The 3D Tractus: A Three-Dimensional Drawing Board. In Proceedings of the First IEEE international Workshop on Horizontal interactive Human-Computer Systems (January 05 - 07, 2006). TABLETOP. IEEE Computer Society, Washington, DC, 169-176. DOI= http://dx.doi.org/10.1109/TABLETOP.2006.33 &lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;I want to draw in 3D, but all I have is this tablet PC. I know, I'll put the table PC on a table that moves up and down, simulating the third dimension. I'll make a drawing program with a user interface that shows the 3D drawing from a few angles so users won't get too confused. We'll use visual cues like depth with line width. Also, we'll use perspective projection so the user knows which things are 'below' the current plane of the tablet PC.&lt;br /&gt;&lt;br /&gt;People used it and said it was neat.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;This drawing/3D modeling approach is a little more realistic than other things we've read (Holosketch or the superellipsoid clay thing) since all you need is a tablet and one of their funky elevator things. So I'll give it kudos there.&lt;br /&gt;&lt;br /&gt;I'm not sure how simple or intuitive it is, however, to have to move your tablet pc up and down to draw in the third dimension. I question the accuracy, especially if you're trying to line things up one on top of another, etc, since you don't really have a good idea of where things are in the Z axis. This is especially hard if what you're trying to draw exists above the current plane of the tablet PC, since their software only shows you what's below the current plane. &lt;br /&gt;&lt;br /&gt;Neat idea, but a bit klunky. Hope someone doesn't get their legs cut off if the tractus goes berserk.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-4731659776015518561?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/4731659776015518561/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=4731659776015518561&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/4731659776015518561'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/4731659776015518561'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/05/lapides-3d-tractus.html' title='Lapides - 3D Tractus'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-3750723127915395646</id><published>2008-05-10T12:01:00.003-05:00</published><updated>2008-05-10T12:31:35.174-05:00</updated><title type='text'>Krahnstoever - Activity Recognition with Vision and RFID</title><content type='html'>Krahnstoever, N.; Rittscher, J.; Tu, P.; Chean, K.; Tomlinson, T., "Activity Recognition using Visual Tracking and RFID," Application of Computer Vision, 2005. WACV/MOTIONS '05 Volume 1. Seventh IEEE Workshops on , vol.1, no., pp.494-500, 5-7 Jan. 2005&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Person in an office or warehouse with cameras on them. Track their movements with a monte carlo model examining the image frames. Augment this with RFID tags embedded in all the objects the human can interact with. Do activity recognition by examining how the person is moving (vision) and what they are interacting with (rfid). RFID helps augment visual tracking for the purposes of activity recognition.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;So they take an existing Monte Carlo visual tracking algorithm and magically throw RFID in to the jar. They say this does better. Sort of a "duh" moment. Why do we let Civil Josh pick papers?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-3750723127915395646?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/3750723127915395646/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=3750723127915395646&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/3750723127915395646'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/3750723127915395646'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/05/krahnstoever-activity-recognition-with.html' title='Krahnstoever - Activity Recognition with Vision and RFID'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-479232640870076382</id><published>2008-05-10T11:55:00.003-05:00</published><updated>2008-05-10T12:01:18.423-05:00</updated><title type='text'>Bernardin - Grasping HMMs</title><content type='html'>Bernardin, K., K. Ogawara, et al. (2005). "A sensor fusion approach for recognizing continuous human grasping sequences using hidden Markov models." Robotics, IEEE Transactions on [see also Robotics and Automation, IEEE Transactions on] 21(1): 47-57.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;I have a robot that I want to teach to grab things. I can teach it by example. I have 14 different types of grips that I use everyday. I'll put pressure sensors in a glove under the CyberGlove. I will grab something, and then let it go. All of this data will be fed into an HMM in the HTK speech recognition toolkit. The HMM will tell me which grasp I am making with up to about 90% accuracy.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Pretty neat. If you know what you're grasping, you can do things like activity recognition and such. Especially helpful when you start using smart rooms and offices, etc. Maybe even Information Oriented Programming (IOP)!&lt;br /&gt;&lt;br /&gt;I think the pressure sensors really helped augment the CyberGlove, especially since there were so many grasp categories.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-479232640870076382?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/479232640870076382/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=479232640870076382&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/479232640870076382'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/479232640870076382'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/05/bernardin-grasping-hmms.html' title='Bernardin - Grasping HMMs'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-7877557077357179395</id><published>2008-05-10T11:49:00.002-05:00</published><updated>2008-05-10T11:55:15.402-05:00</updated><title type='text'>Nishino - Object modelling with gestures</title><content type='html'>Nishino, H., Utsumiya, K., and Korida, K. 1998. 3D object modeling using spatial and pictographic gestures. In Proceedings of the ACM Symposium on Virtual Reality Software and Technology (Taipei, Taiwan, November 02 - 05, 1998). VRST '98. ACM, New York, NY, 51-58. DOI= http://doi.acm.org/10.1145/293701.293708 &lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Put on special glasses to get a 3d stereoscopic image from a curved screen, and put glove/motion tracker on your hands to track them. Have some virtual clay modelled by a superellipsoid (it's mathematically easy to work with, relatively). Create a blob, deform it, mash it, pinch it, stretch it, put it in a pan, bake it up as fast as you can. Combine a bunch of blobs to make things like teapots, vases, and bigger blobs.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Good for professional sculptors who might want to fashion something without wasting real clay. But, since clay is easy to recycle (just add water), who cares. If you're not a sculptor, are you good enough with your hands to make your blobs of junk look like things in real life? How accurate are the hands, so a noisy spike doesn't accidentally mash your teapot into oblivion?&lt;br /&gt;&lt;br /&gt;Pretty neat idea, just not sure of its usefulness.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-7877557077357179395?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/7877557077357179395/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=7877557077357179395&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/7877557077357179395'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/7877557077357179395'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/05/nishino-object-modelling-with-gestures.html' title='Nishino - Object modelling with gestures'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-463747453868343349</id><published>2008-05-10T11:44:00.002-05:00</published><updated>2008-05-10T11:49:34.719-05:00</updated><title type='text'>Campbell - Invariant Features, Tai-Chi</title><content type='html'>Campbell L W, Becker D A, Azarbayejani A, Bobick A F, and Pentland A, Invariant Features for 3-D Gesture Recognition, Proc. of FG'96 (1996) 157-162.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Look at a series of gestures occurring in Tai-Chi captured by video. Extract a lot of features about the gestures, including plain (x,y,z) coordinates, velocities for these coords, polar coordinates, polar velocities. Do each of these with and without head data (always with hand data). Plug all the different sets of features into an HMM and see which feature set does the best. Polar velocity with no head does the best at about 95% accuracy overall. Plain (x,y,z) does the worst at about 34% overall.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Just take a bunch of features and string them all together. Perform a standard feature extraction/selection algorithm. Get a set of features that probably outperforms all your sets.&lt;br /&gt;&lt;br /&gt;Win.&lt;br /&gt;&lt;br /&gt;This paper isn't interesting, really, as it just shotguns a bunch of features into an HMM and see who wins. &lt;br /&gt;&lt;br /&gt;Fail.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-463747453868343349?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/463747453868343349/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=463747453868343349&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/463747453868343349'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/463747453868343349'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/05/campbell-invariant-features-tai-chi.html' title='Campbell - Invariant Features, Tai-Chi'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-6429385656318648346</id><published>2008-05-10T11:37:00.002-05:00</published><updated>2008-05-10T11:43:49.637-05:00</updated><title type='text'>Wesche - FreeDrawer</title><content type='html'>Wesche, G. and Seidel, H. 2001. FreeDrawer: a free-form sketching system on the responsive workbench. In Proceedings of the ACM Symposium on Virtual Reality Software and Technology (Baniff, Alberta, Canada, November 15 - 17, 2001). VRST '01. ACM, New York, NY, 167-174. DOI= http://doi.acm.org/10.1145/505008.505041 &lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Electronic pen you can use to draw in 3D space. To make things simpler for their algorithm, you're restricted to spline curves. You trace out the general curve with the pen and the computer calculates the parameters of the spline. You can draw curves, modify them, connect curves together to form a network, fill in surfaces between curves. You wear wonky VR goggles to see what you're drawing.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Tradeoff between user freedom (virtual clay) and performance--they choose performance by limiting a user's drawing style (restricted to splines). They claim this is easy because it has closed form representation, is easily transferable (just the parameters of the splines and not every voxel need to be transmitted), and computationally cheap (storing every voxel for virtual clay is expensive).&lt;br /&gt;&lt;br /&gt;They admit you need an artistic flair and a little bit of training to get used to using the splines. Well then why not just train on a CAD system? Isn't the point to offer an intuitive interface with no need for training or restrictions? Plus, if you use CAD, you don't have to use /just/ splines, can be precise and exact, and don't have to wear wonky 3D goggles.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-6429385656318648346?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/6429385656318648346/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=6429385656318648346&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/6429385656318648346'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/6429385656318648346'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/05/wesche-freedrawer.html' title='Wesche - FreeDrawer'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-1170170388058247585</id><published>2008-05-10T11:29:00.002-05:00</published><updated>2008-05-10T11:37:17.282-05:00</updated><title type='text'>Poddar - Gesture Spech, Weatherman</title><content type='html'>I. Poddar, Y. Sethi, E. Ozyildiz, R. Sharma. Toward Natural Gesture/Speech HCI: A Case Study of Weather Narration. Proc. Workshops on Perceptual User Interfaces, pages 1-6, November, 1998.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Three categories of gestures: pointing, area, and contour, each with three phases: preparing, making the gesture, and retraction. Use features that measure distances/angles between the face and hands and plug into an HMM. Get 50-60% accuracy on four test sequences.&lt;br /&gt;&lt;br /&gt;Now add speech to the gesture data. Compute co-occurrences of marker words with different gestures and use the data to help the HMM classify gestures. Accuracy goes up about 10%.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Adding speech to gesture data improves the accuracy. This is fairly obvious, and they've shown that it does a little bit. The one thing I don't like is the manual labeling of speech data.&lt;br /&gt;&lt;br /&gt;I wish they would have done more gestures, and their accuracies weren't great. But at least it was a fusion of contextual data.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-1170170388058247585?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/1170170388058247585/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=1170170388058247585&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/1170170388058247585'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/1170170388058247585'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/05/poddar-gesture-spech-weatherman.html' title='Poddar - Gesture Spech, Weatherman'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-4001824816735509767</id><published>2008-04-23T15:30:00.002-05:00</published><updated>2008-04-23T15:49:42.951-05:00</updated><title type='text'>Eisenstein - discourse topic and gestural form</title><content type='html'>Jacob Eisenstein, Regina Barzilay, and Randall Davis. "Discourse Topic and Gestural Form." AAAI 2008.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;The authors want to examine the relationship between gestures and meaning. They are looking for a correspondence between certain gesture and topic, irrespective of the "speaker" of the gesture. If gestures are speaker independent and depend only on topic, this can possibly improve gesture recognition accuracies.&lt;br /&gt;&lt;br /&gt;They set up a topic-author model. Gesture features are extracted for a series of different conversations about different topics, where the speaker is making gestures to accompany his speech. They model gesture features with normals and topic/speaker gesture distributions with multinomials drawn from Dirichlet distributions (Dirichlet compound multinomial, or Polya distribution). Learning the parameters for their models, they use Bayesian inference and statistical significance tests to determine that 12% of all gestures belong to specific topics. Thus, if we have prior information about the topic (ie, speech), we can use contextual information to improve gesture recognition.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;The paper's purpose is to look for a link between gestures and topic. They find a link, but this isn't too surprising given their limited dataset. Furthermore, many of their videos (from which gestures and speech was extracted) were very limited in scope. It's my hypothesis that given a more general scope of data, the percentage of topic-specific gestures would drop.&lt;br /&gt;&lt;br /&gt;It's true that about 10% of word occurrences (about 80% of the vocabulary, with numbers off the top of my head from memory) for large corpora are topic specific and are called content-carrying, since they can identify the topic of a document. However, I don't think there are that many gestures, and there is a great deal more reuse of gestures across topics.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-4001824816735509767?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/4001824816735509767/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=4001824816735509767&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/4001824816735509767'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/4001824816735509767'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/04/eisenstein-discourse-topic-and-gestural.html' title='Eisenstein - discourse topic and gestural form'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-3477517216080254026</id><published>2008-04-14T16:49:00.002-05:00</published><updated>2008-04-14T17:33:51.021-05:00</updated><title type='text'>Chang - Feature selection and grasp</title><content type='html'>Lillian Y. Chang, Nancy S. Pollard, Tom M. Mitchell, and Eric P. Xing. "Feature selection for grasp recognition from optical markers." Proceedings of the 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems San Diego, CA, USA, Oct 29 - Nov 2, 2007.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;31 markers on a hand that give (x,y,z) positions. Use stepwise forward and backward selection to pick a reduced subset of these markers. 5 markers give good accuracy, about 86% compared to 91% max accuracy of full set of markers.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;They use 6 grasp types, how easy/hard are they compared to the 14 types in Bernardin et al?&lt;br /&gt;&lt;br /&gt;SFFS and SBFS are locally optimal, what about +L-R or bidirectional selection?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-3477517216080254026?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/3477517216080254026/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=3477517216080254026&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/3477517216080254026'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/3477517216080254026'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/04/chang-feature-selection-and-grasp.html' title='Chang - Feature selection and grasp'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-4288929167971947444</id><published>2008-04-14T15:49:00.002-05:00</published><updated>2008-04-14T16:44:49.230-05:00</updated><title type='text'>Fels - Glove-Talk II</title><content type='html'>S. Sidney Fels and Geoffrey E. Hinton. "Glove-TalkII—A Neural-Network Interface&lt;br /&gt;which Maps Gestures to Parallel Formant Speech Synthesizer Controls." IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 9, NO. 1, JANUARY 1998.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;CyberGlove with a Polhemus 6 DOF tracker, Contact Glove to measure contacts of fingers and thumbs, and a foot pedal. Three neural networks are implemented to extract speech parameters from the input devices and feed them to a speech synthesizer. Hand height controls pitch. Pedal controls volume.&lt;br /&gt;&lt;br /&gt;A V/C network determines if the user is trying to make a vowel sound or a consonant sound. The inputs are finger flex values, with 5 sigmoid feed-forward hidden units, and the output is probability of making a vowel sound. Vowel is specified by user keepind all fingers unbent and the hand open.&lt;br /&gt;&lt;br /&gt;One network determines the vowel sound the user is trying to make. Vowel sounds are determined by XY position in space, as measured by the Polhemus. An RBF network determines what position the user is in and outputs the appropriate vowel sound parameters for the speech synth.&lt;br /&gt;&lt;br /&gt;The last network looks at the contact glove data, determining which fingers are touching the thumb. Consonant phonemes are mapped to different hand configurations and pattern-matched with the network. The input is flex values, the output is consonant speech synth parameters.&lt;br /&gt;&lt;br /&gt;100 hours of training by one poor sap, who seemed to have provided 2000 examples of input, and he can produce "intelligible and somewhat natural sounding" speech, with the added bonus that he "finds it difficult to speak quickly, pronounce polysyllabic words, and speak spontaneously."&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;First, a caveat. This is a neat idea. I've not seen gesture rec applied to something like this. As far as the idea goes, I'd give it an 8.5 / 10. And also, it's important to remember that humans, using their mouth parts and vocal tract, take what...5 years?...to learn how to produce decent speech. So of course something like this will come with the high cost of training.&lt;br /&gt;&lt;br /&gt;Second, the approach here is poor. The system is far too complicated with all the pedals and hand-wavey motions. One obvious way to simplify it is to remove the second glove (Contact Glove) completely. The authors don't really say what it's used for, and it seems like it's not used for much, especially if the pedal can control stops, etc. For all the vowel and consonant networks, they're basically just performing a nearest neighbor lookup. Why don't they do that and make things much simpler? Perhaps the blending and smoothness of the speech parameters moving from one sound to another, as the network will provide function fitting and interpolation of values. But I think nearest neighbor would work well. &lt;br /&gt;&lt;br /&gt;There are values to compute the centers and variances for the RBFs in their networks using the training data. No need for hand-picked or hard-coded values.&lt;br /&gt;&lt;br /&gt;So if the idea gets 8.5/10, their execution gets a 3/10.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-4288929167971947444?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/4288929167971947444/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=4288929167971947444&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/4288929167971947444'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/4288929167971947444'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/04/fels-glove-talk-ii.html' title='Fels - Glove-Talk II'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-8713997141060381212</id><published>2008-04-09T15:41:00.002-05:00</published><updated>2008-04-09T15:50:23.631-05:00</updated><title type='text'>Kim - RFID target tracking</title><content type='html'>Kim, Myungsik, et al. "RFID-enabled target tracking and following with a mobile robot using direction finding antennas."&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;The authors propose a system for allowing a robot to obtain direction and follow/go to a target, either stationary or mobile, using RFID. The target has an RFID transponder. The robot has two antennae, perpendicular to each other, on a motor-mount so they can rotate independently of the robot. The antennae pick up different signals from the transponder, and can compute direction and distance based on intensity and signal strength ratio. By rotating the antenna array separately from the robot, they can avoid the problem of the robot freaking out in environments densely-populated with obstacles. It can average the signals over time as it rotates, then make a decision after the rotation.&lt;br /&gt;&lt;br /&gt;Results: it can follow stuff.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;1) What is the system latency?&lt;br /&gt;&lt;br /&gt;2) How well does it work "in real life" with a bunch of obstacles?&lt;br /&gt;&lt;br /&gt;3) To use this for hand tracking, we'd put an RFID transponder on our hand, and the computer could track them. How accurate is it? The authors do say the signal ratio is not that great for accuracy ("This makes it difficult to precisely estimate the DOA directly from the ratio") because of noise. Is it centimeter/inch accurate, or is it crappy like the P5 glove? Is the best we can hope for a "Your hand is over there somewhere"?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-8713997141060381212?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/8713997141060381212/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=8713997141060381212&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/8713997141060381212'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/8713997141060381212'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/04/kim-rfid-target-tracking.html' title='Kim - RFID target tracking'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-3448649202337800204</id><published>2008-04-07T11:59:00.003-05:00</published><updated>2008-04-07T12:13:40.615-05:00</updated><title type='text'>Brashear - ASL game</title><content type='html'>Brashear, Helene, et al. "American sign language recognition in game development for deaf children." ASSETS 2006.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Two parts: 1) Wizard of Oz game for helping deaf kids to hearing parents (who presumably can't sign) learn sign language. 2) Recognition system for ASL words/sentences to automate the game's feedback.&lt;br /&gt;&lt;br /&gt;The recognition system uses cameras, a colored glove, and accelerometers attached to the glove. The glove is colored to help image segmentation and hand tracking within the image. Data is automatically segmented at the sentence level with "push to sign" (click mouse to start, click to end). Image is converted to HSV histograms, which are enhanced with filtering. Image tracking is assisted using HSV values that are normalized based on new values and weighted old values (giving more mass to area where the hand was in the last frame). Features used are x, y, z of accelerometers, and vision data: change in x,y center position of hand, length of major/minor axes, eccentricity, orientation angle, direction of major axis in x,y offset. Data is classified with HMMs using &lt;a href="http://jbjohns.blogspot.com/2008/02/westeyn-gt2k.html"&gt;GT2K&lt;/a&gt;. With 90/10 splits of random holdout set testing repeated 100 times (5 kids), they achieve 86% word accuracy on average for their user-independent models, and 61% sentence accuracy.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Decent word accuracy. I think their HMM sentence accuracy was hurt by the fact that they did not have much training data. With more data, and with something a little more robust than GT2K, they might be able to do better. I don't like how they tried to pass off user-dependent results, since these are pretty worthless as you have to train per user. With user-dependent models, you can probably just use something akin to kNN and get close to 100% accuracy, since a user probably doesn't vary /too/ much from one instance to another.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-3448649202337800204?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/3448649202337800204/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=3448649202337800204&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/3448649202337800204'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/3448649202337800204'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/04/brashear-asl-game.html' title='Brashear - ASL game'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-6739297559599476136</id><published>2008-04-07T11:41:00.002-05:00</published><updated>2008-04-07T11:58:17.768-05:00</updated><title type='text'>Ogris - Ultrasonic and Manipulative Gestures</title><content type='html'>Ogris, Georg, et al. "Using ultrasonic hand tracking to augment motion analysis based recognition of manipulative gestures." ISWC 2005.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Wants to augment vision-based system with ultrasonic positioning system to determine what action is being performed on what tool/object in a workshop or something similar. They look at using model-based classification (series of data frames in sequence) with left-right HMMs. They look at frame-based classification using decision trees (C4.5) and kNN. They also examine methods of combining ultrasonic data to constrain the plausible classification results of the classifiers. They classify and get a ranked list, then pick the one that is most plausible given the ultrasonic data. If none are probably enough, ultrasonic data is said to be bad and most likely result of classification is chosen. &lt;br /&gt;&lt;br /&gt;Using ultrasound alone we get 59% with C45 and 60% with kNN. We get 84% accuracy classifying frames of data with kNN. HMMs only perform with 65% accuracy, due to a lack of training data and longer, unstructured gestures. Using plausibility analysis, we can increase frame-based accuracy to 90%.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;I like that they use ultrasonics to get position data to help improve classification accuracy. But this doesn't seem like a groundbreaking addition. They just use a bunch of different classifiers and, gasp, find that contextual information (ultrasonic data) can improve classification accuracy.&lt;br /&gt;&lt;br /&gt;Decent, but nothing ground breaking or surprising.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-6739297559599476136?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/6739297559599476136/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=6739297559599476136&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/6739297559599476136'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/6739297559599476136'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/04/ogris-ultrasonic-and-manipulative.html' title='Ogris - Ultrasonic and Manipulative Gestures'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-3369977259407168536</id><published>2008-04-07T11:10:00.002-05:00</published><updated>2008-04-07T11:41:11.781-05:00</updated><title type='text'>Sawada - Conducting and Tempo Prediction</title><content type='html'>Sawada, Hideyuki, and Shuji Hashimoto. "Gesture recognition using an acceleration sensor and its application to musical performance control." Electronics and Communications in Japan 80:5, 1997.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Use accelerometers and gyroscopes to get data on moving hand. Compute 2D acceleration vectors in XY, XZ, and YZ planes. One feature is the sum of changes in acceleration, another is the rotation of acceleration, and the third feature is the aspect ratio of the two unit components of each 2D acceleration vector (which acceleration component is larger). 8 more features gives the distributions of acceleration over eight principal directions with separation pi/4. These 11 features are computed for each of the three planes, giving 33 features per gesture. The mean and standard deviation for the features are computed, and classification is performed to the gesture with the lowest weighted error (sum of squared difference from mean divided by standard deviation).&lt;br /&gt;&lt;br /&gt;They look at data to see where maxima in acceleration occur, representing places where a conductor changes direction, marking off tempo beats. To try and smooth the computer's performance with respect to changing/noisy tempo beats made by a human, the system uses prediction to guess the next set of tempo. A parameter can be set to change the system's reliance on the human compared to its ability to smooth out noisy tempo beats (linear prediction).&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;They don't really explain their features well. Furthermore, they give this whole thing about rotation feature and then say they don't use it. Well big deal, then. Why list rotation as a feature?&lt;br /&gt;&lt;br /&gt;They're note doing gesture recognition, just marking tempo beats using changes in acceleration. They don't need 33 features for this. They need 3--acceleration in X, Y, and Z. The rest are linearly dependent on the data. They can predict tempo fairly accurately, but I'm not that impressed.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-3369977259407168536?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/3369977259407168536/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=3369977259407168536&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/3369977259407168536'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/3369977259407168536'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/04/sawada-conducting-and-tempo-prediction.html' title='Sawada - Conducting and Tempo Prediction'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-9597958695066155</id><published>2008-03-31T15:12:00.002-05:00</published><updated>2008-03-31T15:23:22.380-05:00</updated><title type='text'>Mantyjarvi - Accelerometer DVD HMMs</title><content type='html'>Mantyjarvi, Jani, Juha Kela, Panu Korpipaa, and Sanna Kallio. "Enabling fast and effortless customisation in accelerometer based gesture interaction." MUM 2004.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Take accelerometer data. Segment the gesture by holding a button down during movement. Resample the gesture to 40 frames. Vector quantize (with k-means) the 40 3D points into 40 codewords (size of the codebook is 8). Plug the 40D vectors into ergodic, 5 state HMM and classify. Train HMMs until percent difference in log likelihood is behold a threshold.&lt;br /&gt;&lt;br /&gt;Need more training data? Augment some of the training examples you do have with some noise, either uniformly or normally distributed. Signal to noise ratio of about 3 is best for Gaussian, 5 for uniform, and both slightly increase accuracy when used to generate training examples. Accuracy increases with more training examples. They get about 98% accuracy for their easy data set.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Another paper that uses a ridiculously easy gesture set for use with powerful hidden Markov models. I think Rubine or $1 would do just as good, and wouldn't require the complexity of HMMs.&lt;br /&gt;&lt;br /&gt;I do like the idea of generating new training examples by adding artificial noise. This can be useful when you don't have a lot of training data to begin with. However, I don't like the way they did it. They should be learning the parameters for their distributions by examining the real data. For example, using the real training examples, discover what the mean and covariance values should be. Then, sample this (these) distributions to get new training examples, rather than adding noise to a real training example (which will make outliers even worse). Also, it's not clear if there is any real advantage to using Gaussian over uniformly distributed noise. In Fig 6, Gaussian seems to do better for low SNR and uniform better for high SNR. And in Fig 7, the results are all over the place. Are the differences in accuracies statistically significant?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-9597958695066155?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/9597958695066155/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=9597958695066155&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/9597958695066155'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/9597958695066155'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/03/mantyjarvi-accelerometer-dvd-hmms.html' title='Mantyjarvi - Accelerometer DVD HMMs'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-5727956678222885271</id><published>2008-03-31T11:07:00.000-05:00</published><updated>2008-03-31T11:08:51.085-05:00</updated><title type='text'>Wobbrock - $1</title><content type='html'>&lt;a href="http://jbjohns.blogspot.com/2007/12/wobbrock-et-al-1-recognizer.html"&gt;Wobbrock et al - $1 Recognizer&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-5727956678222885271?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/5727956678222885271/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=5727956678222885271&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/5727956678222885271'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/5727956678222885271'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/03/wobbrock-1.html' title='Wobbrock - $1'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-7695444086757965915</id><published>2008-03-30T18:39:00.003-05:00</published><updated>2008-03-30T18:45:28.331-05:00</updated><title type='text'>Lieberman - TIKL</title><content type='html'>Lieberman, Jeff and Cynthia Breazeal. "TIKL: Development of a wearable vibrotactile feedback suit for improved human motor learning."&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;1) Teacher puts on this suit with VICON sensors and build in vibrating doo-has.&lt;br /&gt;2) Teacher does a gesture and it is trained. &lt;br /&gt;3) Give suit to student, adjust it etc.&lt;br /&gt;4) Student does gesture, and suit vibrates to correct errors (difference in joint angles between teacher and student, multiplied by coefficient for amount of feedback). Can do cues to rotate and bend, etc.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Blah blah, graphs. Overall error is reduced and training time is reduced using suit compared to one without suit.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Admitted flaws: cost of VICON and bulkiness/hassle of the half-body suit. The idea here is //REALLY// neat, using vibration feedback to correct gestures. Just not into it because there aren't any algorithms a machine learning person like me is into.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-7695444086757965915?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/7695444086757965915/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=7695444086757965915&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/7695444086757965915'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/7695444086757965915'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/03/lieberman-tikl.html' title='Lieberman - TIKL'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-3677700122254153752</id><published>2008-03-30T18:29:00.002-05:00</published><updated>2008-03-30T18:38:38.265-05:00</updated><title type='text'>Lee - Neural Network Taiwanese Sign Language</title><content type='html'>Lee, Yung-Hui, and Cheng-Yueh Tsai. "Taiwan sign language (TSL) recognition based on 3D data and neural networks." Expert Systems with Applications (2007). doi:10.1016/j.eswa.2007.10.038&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Vision based neural network posture recognition (20 static postures). Manual segmentation is used, and the recording is rigged to be nearly perfect postures. The hands/fingers are tracked by VICON system (3D coordinates) and filtered. The features that are computed are the distances between different landmarks on the hand, normalized to account for varying hand sizes. Features are fed into a neural network with 2 hidden layers, each with 250 hidden units, trained for 3000 epochs or until root mean squared error &lt; 0.0001. About 95% accuracy.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Holy overtraining, Batman! That's a lot of hidden units! Especially for a problem this set up to be easy...static postures...practiced until it was perfect...recognition should be close to perfect. Just do template matching. Also, don't include your training data in your test set.&lt;br /&gt;&lt;br /&gt;Your features must really suck if you can't get closer to 100% on this problem (like &gt; 99%). Even if you do pixel-by-pixel template matching, you should get pretty darn close to 99%. Heck, even handwritten digit recognition is close to 100%.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-3677700122254153752?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/3677700122254153752/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=3677700122254153752&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/3677700122254153752'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/3677700122254153752'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/03/lee-neural-network-taiwanese-sign.html' title='Lee - Neural Network Taiwanese Sign Language'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-1518908523295892637</id><published>2008-03-30T18:15:00.002-05:00</published><updated>2008-03-30T18:29:31.322-05:00</updated><title type='text'>Patwardhan - Predictive EigenTracker</title><content type='html'>Patwardhan, K. S. and S. D. Roy. "Hand gesture modelling and recognition involving changing shapes and trajectories, using a Predictive EigenTracker." Pattern Recognition Letters 28 (2007), 329--34.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;The authors seek to recognize dynamic hand gesture with changing shape as well as motion. They use principal components analysis (PCA) to get an eigenspace representation of the objects they wish to track (hands). Within the eigenspace, particle filtering is used to predict where the eigen-hands (hand image projected into eigenspace) will appear next. Skin color and motion cues are used to initialize the system automatically.&lt;br /&gt;&lt;br /&gt;The EigenTracker is used to segment the hand motions (second paragraph of section 3) when "a drastic change in the appearance of the gesticulating hand, caused by the change in the hand shape, results in a large reconstruction error. This forces an epoch change, indicating an new shape of the gesticulating hand." The segments are used to create shape/motion pairs for the gesture. Trajectories are modeled with linear regression (least-squares linear approximation).&lt;br /&gt;&lt;br /&gt;The tracked hand gestures are modeled as sequences of shape/movement pairs. The models are trained to get a mean gesture and covariace (Gaussian models), and the model with the smallest Mahalanobis distance to our training set is chosen as the classification label. &lt;br /&gt;&lt;br /&gt;5 eigenvectors are used in PCA to capture 90% of the variance. Each gesture split into 2 epochs. Using Mahalanobis distance, they get 100% classification accuracy.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;They test with their training data, so this is crap. Also, their dataset is extremely simple, with very unique and defined shape/trajectory patterns. And, their background and image tracking is very clean (not a lot of noise) and too easy, as well. They say their data is easy to prove an optimal upper bound on classification accuracy...which turns out to be 100%. So, um, no duh? I'm going to make something impossible to classify and prove the lower bound is 0% (or at most 1/n, a random guess), sound good?&lt;br /&gt;&lt;br /&gt;That said, I do like the way they use PCA to simplify the data and particle filtering to both track the hand and segment epochs. It's just their data sets that leave me feeling unimpressed.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-1518908523295892637?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/1518908523295892637/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=1518908523295892637&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/1518908523295892637'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/1518908523295892637'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/03/patwardhan-predictive-eigentracker.html' title='Patwardhan - Predictive EigenTracker'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-8490259758329967133</id><published>2008-03-30T18:11:00.003-05:00</published><updated>2008-03-30T18:15:20.977-05:00</updated><title type='text'>Kratz - Wiizards</title><content type='html'>Kratz, Louis, MAtthew Smith, and Frank J. Lee. "Wiizards: 3D Gesture Recognition for Game Play Input." FuturePlay 2007.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;So basically you have a Wii remote that takes {x,y,z} position every so often and generates a sequence of these positions. A hidden Markov model is trained on sequences and then used to classify a gesture (model with max Viterbi probability). As you increase number of HMM states and number of training examples, accuracy increases. Without user specific data, you get around 50% accuracy regardless. As you increase number of states, your system slows down.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;The application is neat, but all their results are of the "Duh" type. The game is neater than the implementation details, since you can combine spells and stuff for different effects.&lt;br /&gt;&lt;br /&gt;How do they do segmentation?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-8490259758329967133?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/8490259758329967133/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=8490259758329967133&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/8490259758329967133'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/8490259758329967133'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/03/kratz-wiizards.html' title='Kratz - Wiizards'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-2961685337539279360</id><published>2008-03-19T14:57:00.002-05:00</published><updated>2008-03-19T15:07:26.936-05:00</updated><title type='text'>Kato - Hand Tracking w/ PCA-ICA Approach</title><content type='html'>Kato, Makoto, Yen-Wei Chen, and Gang Xu. "Articulated Hand Tracking by PCA-ICA Approach." In Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (FGR'06).&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Kato et al. seek a way to represent hand data that is easier to handle. The problem they present is that hand-tracking data has too many dimensions and is difficult to handle in a feasible manner. They take motion data (bending each finger down to touch the palm) and split it into 100 time instants, with each instant containing bend data for 20 different joints in the hands. So each gesture (whole range of motion), is a 2000-dimension data vector (100 20-d vectors concatenated together).&lt;br /&gt;&lt;br /&gt;They try to do feature extraction using PCA and ICA, both. They say ICA is better because it can extract the independent movement of the individual fingers, where PCA the movements of the fingers are not individual. Then they mention hand tracking using particle filtering, where we estimate the next position (?) of the hand using its current position. &lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;This paper has no clear purpose. I don't understand what the authors are trying to tell me. Because of that, I don't have much to offer that's not a rant.&lt;br /&gt;&lt;br /&gt;PCA is not supposed to give you "feasible" hand positions. It tells you the directions of the highest variance.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-2961685337539279360?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/2961685337539279360/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=2961685337539279360&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/2961685337539279360'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/2961685337539279360'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/03/kato-hand-tracking-w-pca-ica-approach.html' title='Kato - Hand Tracking w/ PCA-ICA Approach'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-127568278392612886</id><published>2008-03-17T16:04:00.003-05:00</published><updated>2008-03-17T16:43:26.927-05:00</updated><title type='text'>Jenkins - ST-ISOMAP</title><content type='html'>Jenkins, O.C. and Mataric, M.J. "A spatio-temporal extension to Isomap nonlinear dimension reduction." ICML 2004.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Jenkins and Mataric present an extension to ISOMAP to take into consideration temporal data when constructing manifolds. ISOMAP is used to find embeddings in a high-dimensional space (manifolds) using geodesic distances and multi-dimensional scaling. ST-ISOMAP is an extension that uses temporal information. Items that are close to each other temporally have their spatial distances reduced. &lt;br /&gt;&lt;br /&gt;The idea is that in some domains, like movement of an arm, things that are close together spatially might be quite different. For example, an arm moving one way might be very different than an arm moving the other way. The temporal differences between these gesture would be high because you'd arrive at the same spatial location via different temporal paths (sequences of arm locations). Likewise, seemingly different spatial locations might be very similar, and only 'close' to each other regarding temporal data (arm movements in the same direction but at different heights off the ground). ST-ISOMAP tries to capture these things.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;ISOMAP is a proven algorithm, and so is this extension for finding the manifold with temporal data. I think this could be useful for clustering of haptic gesture information. The high dimension space of the fingers+hand location could be reduced with ISOMAP into a simpler space where gestures could be segmented or classified more easily.&lt;br /&gt;&lt;br /&gt;Maybe. Seems like a neat approach, anyhow. And ISOMAP is used for a /ton/ of stuff in machine learning, so it's not like this is a cheesy hack that no one really uses.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;BibTeX&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;@inproceedings{jenkins2004ste,&lt;br /&gt;  title={{A spatio-temporal extension to Isomap nonlinear dimension reduction}},&lt;br /&gt;  author={Jenkins, O.C. and Matari{\'c}, M.J.},&lt;br /&gt;  journal={International Conference on Machine Learning},&lt;br /&gt;  year={2004},&lt;br /&gt;  publisher={ACM Press New York, NY, USA}&lt;br /&gt;}&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-127568278392612886?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/127568278392612886/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=127568278392612886&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/127568278392612886'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/127568278392612886'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/03/jenkins-st-isomap.html' title='Jenkins - ST-ISOMAP'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-365706546604683534</id><published>2008-02-27T08:51:00.002-06:00</published><updated>2008-03-30T18:06:05.460-05:00</updated><title type='text'>Sagawa - Recognizing Sequence Japanese Sign Lang. Words</title><content type='html'>Sagawa, H. and Takeuchi, M. 2000. "A Method for Recognizing a Sequence of Sign Language Words Represented in a Japanese Sign Language Sentence." In Proceedings of the Fourth IEEE international Conference on Automatic Face and Gesture Recognition 2000 (March 26 - 30, 2000). FG. IEEE Computer Society, Washington, DC, 434. &lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;The authors present a method for segmenting gestures in Japanese sign language. Using a set of 200 JSL sentences (100 for training and 100 for testing), they train a set of parameter thresholds. The thresholds are used to determine borders of signed words, if the word is one- or two-handed, and distinguish transitions from actual words.&lt;br /&gt;&lt;br /&gt;They segment gestures using "hand velocity," which is the average change in position of all the hand parts from one point to the next. Minimal hand velocity (when all the parts are staying relatively still) is flagged as a possible segmentation point (i.e., Sezgin's speed points). Another candidate for segmentation points is a cosine metric, which measures the inner product of a hand's elements at a current point compared to a window +- n points. If the change in angle is above a threshold, flagged as a candidate (i.e., Sezgin's curvature points). Erroneous velocity candidates are thrown out if the change velocity change from (t-n to t) or (t to t+n) is not great enough. &lt;br /&gt;&lt;br /&gt;Determination of which hands are used (both vs. one hand, right vs. left hand) is done by comparing the hand velocities of the two hands, both on "which max is greater" (Eq 3) and "avg squared difference in velocity &gt;? 1" (Eq 4). Thresholds are trained to recognize these values.&lt;br /&gt;&lt;br /&gt;Using their stuff, they segment words correctly 80% of the time, and misclassify transitions as words 11% of the time. They say they are able to improve classification accuracy of words (78 to 87) and sentences (56 to 58).&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;So basically they're using Sezgin's methods. I don't like all the thresholds. They should have done something more statistically valid and robust, since this requires extensive training and is very training-set dependent. Furthermore, different signs and gestures seem like they will have different thresholds, so training on the whole set will make them always get segmented wrong. I guess this is why their accuracy is less than stellar.&lt;br /&gt;&lt;br /&gt;Basically, they just look at which hand is moving more, or if both hands are moving about the same, to tell one/two-handed and right/left-handed. Meh. Not that impressed.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;BibTeX&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;@inproceedings{796189,&lt;br /&gt; author = {Hirohiko Sagawa and Masaru Takeuchi},&lt;br /&gt; title = {A Method for Recognizing a Sequence of Sign Language Words Represented in a Japanese Sign Language Sentence},&lt;br /&gt; booktitle = {FG '00: Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition 2000},&lt;br /&gt; year = {2000},&lt;br /&gt; isbn = {0-7695-0580-5},&lt;br /&gt; pages = {434},&lt;br /&gt; publisher = {IEEE Computer Society},&lt;br /&gt; address = {Washington, DC, USA},&lt;br /&gt; }&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-365706546604683534?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/365706546604683534/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=365706546604683534&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/365706546604683534'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/365706546604683534'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/02/sagawa-recognizing-sequence-japanese.html' title='Sagawa - Recognizing Sequence Japanese Sign Lang. Words'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-3438056014245889755</id><published>2008-02-27T08:40:00.005-06:00</published><updated>2008-03-30T17:00:40.494-05:00</updated><title type='text'>Storring - Computer Vision-Based Gesture Rec for Augmented Reality</title><content type='html'>M. Störring, T.B. Moeslund, Y. Liu, and E. Granum. "Computer Vision-based Gesture Recognition for an Augmented Reality Interface." In Proceedings of 4th IASTED International Conference on Visualization, Imaging, and Image Processing. Marbella, Spain, Sep 2004: 766-71.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;The authors present a vision-based system for simple posture recognition--a hand with 1-5 digits extended. They do skin color matching and segmentation to get a blob of the hand with normalized RGB values. These are then modeled as 2D Gaussians (chromaticity in r and g dimensions) and clustered. They choose the cluster corresponding to skin color defined with a certain number of pixels (min and max thresholds). Filtering is used to make the hand blobs continuous. They locate the center of the hand blob and using concentric circles with expanding radii, count the number of extended digits. This is their classification.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;If you have a tiny or huge hand, or the camera is zoomed in/out, your hand pixels may be too numerous/too sparse and not fall in their limits, so not get picked up correctly in the skin detection/hand tracking part of the algorithm.&lt;br /&gt;&lt;br /&gt;Fingers must be spread enough for the concentric circle things to say there are two and not just 1.&lt;br /&gt;&lt;br /&gt;I'd like details on how they find the center of the hand for their circles. I'd also like details on how they identify different fingers. For example, for their "click" gesture, do they just assume that a digit at 90 degrees to the hand that's seen/unseen/seen is a thumb moving? How do they sample the frames to get those three states?&lt;br /&gt;&lt;br /&gt;First sentence: "less obtrusive." Figure 1: Crazy 50 pound head sucker thing. Give me my keyboard and mouse back.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;BibTeX&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;@proceedings{storring2004visionGestureRecAR&lt;br /&gt;,author="M. Störring and T.B. Moeslund and Y. Liu and E. Granum"&lt;br /&gt;,title="Computer Vision-based Gesture Recognition for an Augmented Reality Interface"&lt;br /&gt;,booktitle="4th IASTED International Conference on Visualization"&lt;br /&gt;,address="Marbella, Spain"&lt;br /&gt;,month="Sep"&lt;br /&gt;,year="2004"&lt;br /&gt;,pages="776--771"&lt;br /&gt;}&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-3438056014245889755?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/3438056014245889755/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=3438056014245889755&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/3438056014245889755'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/3438056014245889755'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/02/storring-computer-vision-based-gesture.html' title='Storring - Computer Vision-Based Gesture Rec for Augmented Reality'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-1701597076872206313</id><published>2008-02-25T17:22:00.005-06:00</published><updated>2008-03-30T17:06:13.623-05:00</updated><title type='text'>Westeyn - GT2k</title><content type='html'>Westeyn, Tracy, Helene Brashear, Amin Atrash, and Thad Starner. "Georgia Tech Gesture Toolkit: Supporting Experiments in Gesture Recognition." In Proceedings of the International Conference on Perceptive and Multimodal User Interfaces 2003 (ICMI), November 2003.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Westeyn et al. present a toolkit to simplify the recognition of hand gestures using hidden Markov models. Their system, dubbed GT&lt;sup&gt;2&lt;/sup&gt;k, runs on top of the HMM toolkit used for speech recognition. It abstracts away the complexity of HMMs and the application to speech recognition (it's been shown that speech models do good recognizing gestures, too). You provide feature vectors, a grammar defining classification targets and how they are related, and trained examples. The system will train and can be used later for classification purposes.&lt;br /&gt;&lt;br /&gt;They give several example applications which use the GT&lt;sup&gt;2&lt;/sup&gt;k. The first recognizes postures performed between a camera and an array of IR LEDs, and achieves 99.2% accuracy for 8 classes. They also give examples of blink-prints, mobile sign language rec (90%) and workshop recognition.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;So first off, it's neat that there is a little toolkit thing we can use to do hand gesture recognition. Built on top of a HMM kit for speech recognition isn't too scary since HMMs pretty much pwn speech rec. It also makes HMMs more available to the masses.&lt;br /&gt;&lt;br /&gt;That being said, I don't feel like the authors really applied their toolkit to any example that is truly worthy of the power of an HMM. The driving thing is a simple neural network and is crazy easy with even template matching. The blink print thing, besides being dumb, is just short/long sequence identification and template matching / nearest neighbor. Telesign... their grammar looks like you'd have to specify all possible orderings of words (UGH!). I think GT2K has promise in this area, however. Workshop activity recognition... besides the fact that the sensor data is able to classify activities, which is neat, this application is absurd.&lt;br /&gt;&lt;br /&gt;However, again I'd like to clarify that the GT2K is a great idea and I'd like to use it more, hopefully with more worthy applications.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;BibTeX&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;@inproceedings{958452,&lt;br /&gt; author = {Tracy Westeyn and Helene Brashear and Amin Atrash and Thad Starner},&lt;br /&gt; title = {Georgia tech gesture toolkit: supporting experiments in gesture recognition},&lt;br /&gt; booktitle = {ICMI '03: Proceedings of the 5th international conference on Multimodal interfaces},&lt;br /&gt; year = {2003},&lt;br /&gt; isbn = {1-58113-621-8},&lt;br /&gt; pages = {85--92},&lt;br /&gt; location = {Vancouver, British Columbia, Canada},&lt;br /&gt; doi = {http://doi.acm.org/10.1145/958432.958452},&lt;br /&gt; publisher = {ACM},&lt;br /&gt; address = {New York, NY, USA},&lt;br /&gt; }&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-1701597076872206313?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/1701597076872206313/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=1701597076872206313&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/1701597076872206313'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/1701597076872206313'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/02/westeyn-gt2k.html' title='Westeyn - GT2k'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-4003699860069311369</id><published>2008-02-22T11:08:00.005-06:00</published><updated>2008-02-25T15:57:32.535-06:00</updated><title type='text'>Lichtenauer - 3D Visual recog. of NGT sign production</title><content type='html'>J.F. Lichtenauer, G.A. ten Holt, E.A. Hendriks, M.J.T. Reinders. "3D Visual Detection of Correct NGT Sign Production." Thirteenth Annual Conference of the Advanced School for Computing and Imaging, Heijen, The Netherlands, June 13-15 2007.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Lichtenauer et al. discuss a system for recognizing Dutch sign language (NGT) gestures using two cameras and an image processing approach. The system is initialized semi-automatically to take the skin color of the face. The hands are then  located in the image by finding the points that have the same color as the face. Gesture segmentation is manually enforced, with the hands at rest on the table between gestures. The gestures are turned into feature vectors of movement/angle through space (blob tracking) and time, and compared to a reference gesture per class using dynamic time warping. The features are classified independently of one another, and the results per class per feature are summed, giving one average probability per class (across the features). If the probability is above a certain threshold, the gesture is labeled as that class. They report 95% true positive rate and 5% false positive.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;This method seems pretty hardcore on the computation, since they're doing a classifier for each of the ~5000 features. I don't know if that's how all DTW stuff works, but I think you could do something to dramatically reduce the amount of error. &lt;br /&gt;&lt;br /&gt;If you wear a short sleeve shirt, will the algorithm break if it starts trying to track your forearms or elbows? It's just using skin color, so I think it might.&lt;br /&gt;&lt;br /&gt;They use the naive Bayes assumption to make all their features independent of each other. I think this is pretty safe to do, especially as it simplifies computation. They do mention that even though some features might contain correlation, they've added features to capture this correlation independently, and extract it out of the space "between" features (that's a hokey way to put it, sorry).&lt;br /&gt;&lt;br /&gt;They don't report accuracy, but true positives. This is pretty much bogus, as far as I'm concerned, as it doesn't tell you much about how accurate their system is at recognizing gestures correctly.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;BibTex&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;@proceedings{lightenauer2007ngtSign&lt;br /&gt;,author="J.F. Lichtenauer and G.A. ten Holt and E.A. Hendriks and M.J.T. Reinders"&lt;br /&gt;,title="{3D} Visual Detection of Correct {NGT} Sign Production"&lt;br /&gt;,booktitle="Thirteenth Annual Conference of the Advanced School for Computing and Imaging"&lt;br /&gt;,address="Heijen, The Netherlands"&lt;br /&gt;,year="2007"&lt;br /&gt;,month="June"&lt;br /&gt;}&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-4003699860069311369?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/4003699860069311369/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=4003699860069311369&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/4003699860069311369'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/4003699860069311369'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/02/lichtenauer-3d-visual-recog-of-ngt-sign.html' title='Lichtenauer - 3D Visual recog. of NGT sign production'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-6762659414710217172</id><published>2008-02-22T09:56:00.002-06:00</published><updated>2008-02-22T11:03:19.193-06:00</updated><title type='text'>LaViola - Survey of Haptics</title><content type='html'>LaViola, J. J. 1999 A Survey of Hand Posture and Gesture Recognition Techniques and Technology. Technical Report. UMI Order Number: CS-99-11., Brown University. &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;We read only chapters 3 and 4. LaViola gives a nice summary over the many methods for haptic recognition and many domains where it can be used.&lt;br /&gt;&lt;br /&gt;Template matching (like a $1 for haptics) is easy and has been implemented with good accuracy for small sets of gestures. Feature-based classifiers, like Rubine, have been used for very high accuracy rates, as well as segmentation of gestures. PCA can be used to form "eigenpostures" and to simplify data, possibly, for recognition. Obviously, as we've seen many times in class, neural networks and hidden Markov models can both be used to achieve high accuracy for complex data sets, but both require extensive training and some a priori knowledge of the data set (number of hidden layers/units and number of hidden states for nets and HMMs, respectively). Instance based learning, such as k-nearest neighbors, has also been briefly touched upon in the literature, but not much investigation has been performed. Other techniques, like using formal grammars to describe postures/gestures, are also discussed but not much work has been done in these areas.&lt;br /&gt;&lt;br /&gt;The application domains for hand gesture recognition is basically all the stuff we've seen in class: sign language, virtual environments, controlling robots/computer systems, and 3D modelling.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;This was a very nice overview of the field. I'm most interested in exploring:&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;Template matching methods and feature based recognition (Sturman and Wexelblat)&lt;/li&gt;&lt;br /&gt;&lt;li&gt;PCA for gesture segmentation&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Using a k-nearest neighbors approach to classification&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Defining a constraint grammar to express a posture/gesture&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;All my ideas (except for the last) stem around the idea of representing a posture/gesture with a vector of features. Picking good features might be hard, as it is in sketch rec, but I think that it can be done (analogous to PaleoSketch).&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;BibTeX&lt;/h3&gt;&lt;br /&gt;@techreport{864649,&lt;br /&gt; author = {Joseph J. LaViola, Jr.},&lt;br /&gt; title = {A Survey of Hand Posture and Gesture Recognition Techniques and Technology},&lt;br /&gt; year = {1999},&lt;br /&gt; source = {http://www.ncstrl.org:8900/ncstrl/servlet/search?formname=detail\&amp;id=oai%3Ancstrlh%3Abrowncs%3ABrownCS%2F%2FCS-99-11},&lt;br /&gt; publisher = {Brown University},&lt;br /&gt; address = {Providence, RI, USA},&lt;br /&gt; }&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-6762659414710217172?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/6762659414710217172/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=6762659414710217172&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/6762659414710217172'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/6762659414710217172'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/02/laviola-survey-of-haptics.html' title='LaViola - Survey of Haptics'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-1104488274975573486</id><published>2008-02-22T09:45:00.003-06:00</published><updated>2008-02-22T09:55:34.045-06:00</updated><title type='text'>Komura - Real-Time Locomotion w/ Data Gloves</title><content type='html'>Komura, T. and Lam, W. 2006. Real-time locomotion control by sensing gloves: Research Articles. Comput. Animat. Virtual Worlds 17, 5 (Dec. 2006), 513-525. DOI= http://dx.doi.org/10.1002/cav.v17:5 &lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Komura and Lam present a method for mapping the movements of fingers (while wearing a data glove) and the hand into controlling a character in a 3D game, such as a character walking, running, hopping, and turning. They achieve this in two steps. The first is to give the user an on-screen example of a character walking, and then have the user mimic the action. During this stage, the data from the glove is calibrated to determine the periodicity of movements, etc, and a mapping from the finger movements to the movements of the character of the screen is made. The movement of each finger is compared to the movement of each end-point of the figure (legs, chest, etc), and the finger feature-vector that has the smallest angle to a given body part's feature vector (velocities and directions) is mapped to that body part. The mapping of function values (movement in finger to amount of movement in body) is simply made as the regression of a B-spline. The user can then move his hands around and make the character walk/run/jump with the mapped values. They perform a user study by making people run a character through a narrow set of passages, and find that users run through just as fast using keyboard vs. the glove, but tend to be more accurate and have fewer collisions with walls using the glove. They attribute this to the intuitive interface of using one's hands to control a figure.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;The one thing I did like about this paper was the way they mapped fingers to a set of pre-defined motions and then mapped the movements of the fingers to those of the characters. It seemed neat, but I don't think it has any research merit.&lt;br /&gt;&lt;br /&gt;Why do you even have to learn anything? Everything they do is so rigid and pre-defined anyway, like the mapping of 2/4 legs of a dog to each finger with a set way for computing the period delay between front/rear legs. Why not just force a certain leg on the character to be a certain finger and avoid mapping altogether? Maybe you could still fit the B-spline to get a better idea of sensitivity, but the whole cosine thing is completely unnecessary. &lt;br /&gt;&lt;br /&gt;They only use one test and it's very limited, so I don't think they can make the claim that "is is possible to conclude that the sensing glove controlling more effective when controlling the character more precisely." I also want standard deviation and significance levels for Tables 1 and 2, though for such a small sample size these might not be meaningful.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;BibTeX&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;@article{1182569,&lt;br /&gt; author = {Taku Komura and Wai-Chun Lam},&lt;br /&gt; title = {Real-time locomotion control by sensing gloves: Research Articles},&lt;br /&gt; journal = {Comput. Animat. Virtual Worlds},&lt;br /&gt; volume = {17},&lt;br /&gt; number = {5},&lt;br /&gt; year = {2006},&lt;br /&gt; issn = {1546-4261},&lt;br /&gt; pages = {513--525},&lt;br /&gt; doi = {http://dx.doi.org/10.1002/cav.v17:5},&lt;br /&gt; publisher = {John Wiley and Sons Ltd.},&lt;br /&gt; address = {Chichester, UK, UK},&lt;br /&gt; }&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-1104488274975573486?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/1104488274975573486/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=1104488274975573486&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/1104488274975573486'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/1104488274975573486'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/02/komura-real-time-locomotion-w-data.html' title='Komura - Real-Time Locomotion w/ Data Gloves'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-8268599167110765323</id><published>2008-02-20T15:42:00.004-06:00</published><updated>2008-02-20T16:21:08.586-06:00</updated><title type='text'>Freeman - Television Control</title><content type='html'>Freeman, William T. and Craig D. Weissman. "Television Control by Hand Gestures." In Proceedings of the IEEE Intl. Wkshp. on Automatic Face and Gesture Recognition, Zurich, June, 1995.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Freeman and Weissman present a system for controlling a television (channel and volume) using "hand gestures." The hold up their hand in front of a television/computer combo. The computer recognizes an open hand using image processing techniques. When an open hand is seen, a menu opens with controls (buttons/slider bar) to control the channel and volume. They move their hand around and hover it over the controls to activate them. To stop, they close their hand or otherwise remove their open hand from the camera's FOV. They recognize an open palm using a cosine similarity metric (normalized correlation) between a pre-defined image of a palm and every possible offset within the image.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Not in the mood to write decent prose, so here's a list.&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;Is natural language really that much better? First, it contains a lot of ambiguity that mouse/keyboard don't have. Second, you'd have just as many problems defining a vocabulary of commands using language as you would gestures, especially since there are so many words/synonyms/etc.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Their example of a 'complicated gesture' is a goat shadow puppet. Seriously? I think this is a little exaggerated and a lot ridiculous.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;These aren't really gestures. It's just image tracking that boils down to nothing more than a mouse. What have you saved? Just buy 10 more remotes and glue them to things so you have one in every sitting spot and they can't be lost.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;I don't know the image rec. research area, so I can't comment too much on their algorithm. But this seems like it would be super slow (taking all possible offsets) and have issues with scaling (what if the hand template is the wrong size, esp too small for the actual hand in the camera image).&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;BibTeX&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;@proceedings{freeman1995televisionGestures&lt;br /&gt;,author="William T. Freeman and Craig D. Weissman"&lt;br /&gt;,title="Television Control by Hand Gestures"&lt;br /&gt;,booktitle="IEEE Intl. Wkshp. on Automatic Face and Gesture Recognition"&lt;br /&gt;,address="Zurich"&lt;br /&gt;,year="1995"&lt;br /&gt;,month="June"&lt;br /&gt;}&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-8268599167110765323?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/8268599167110765323/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=8268599167110765323&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/8268599167110765323'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/8268599167110765323'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/02/freeman-television-control.html' title='Freeman - Television Control'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-1444806092307226163</id><published>2008-02-13T15:41:00.002-06:00</published><updated>2008-02-13T17:02:46.079-06:00</updated><title type='text'>Marsh - Shape your imagination</title><content type='html'>Marsh, T.; Watt, A., "Shape your imagination: iconic gestural-based interaction," Virtual Reality Annual International Symposium, 1998. Proceedings., IEEE 1998 , vol., no., pp.122-125, 18-18 1998&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Marsh and Watt present findings on a user study where they examine how gestures are used to describe objects in a non-verbal fashion. They describe how iconic gestures (those that immediately and clearly stand for something) fall into two camps: substitutive (hands match the shape or form of object) and virtual (outline or trace a picture of the shape/object).&lt;br /&gt;&lt;br /&gt;For their study, they used 12 subjects of varying backgrounds. They had 15 shapes from two categories: primitive (circle, triangle, sphere, cube, etc) and complex (table, car, French baguette, etc). The shapes were written on index cards and presented in the same order to each user. The users then were told to describe the shapes using non-verbal communication. Of all the gestures, 75% were virtual. For the 2D shapes, 72% of people used one hand, while the 3D objects were all describe with 2 hands. For the complex shapes, iconic gestures were either replaced or accompanied by pantomimic (how the object was used) or deictic (pointing to something) gestures, rather than iconic ones. Some complex shapes were too difficult for users to express (4 for chair, 1 each for football, table, and baguette). They also discovered 2D is easier than 3D.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;I really liked this paper. While it was a little short, I think it was neat that they were able to break up the gestures that people made. This reminded me a lot of Alvarado et al's paper where they performed the user study about how people draw. I think it's especially useful to see that if we want to do anything useful with haptics, we have to enable the users to use /both/ hands.&lt;br /&gt;&lt;br /&gt;Some things:&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;How did they pick their shapes, especially the complex ones? I mean, come on, French baguette? Although, this is a really good example because it's friggin hard to mime.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;They note that most of the complex objects are too difficult to express with iconic gestures alone. That's why sign languages aren't that simple to learn. Not everything can be expressed easily with just iconic gestures. This paper was good that it pointed this out and made it clear, even though it seems obvious. It also seems to drive the need for multi-modal input for complex recognition domains.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;They remark that 3D is harder than 2D. Besides the fact that this claim is obvious and almost a bit silly to make, it does seem that there are 2D shapes that would be very difficult to express. For example: Idaho. I wonder if their comparison between 2D and 3D here is a fair one. Obviously adding another dimension to things is going to make it exponentially more difficult, but they're comparing things like circle to things like French baguette.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Finally, who decides if a gesture is iconic or not? Isn't this shaped by experience and perception?&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;BibTeX&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;@ARTICLE{658465,&lt;br /&gt;title={Shape your imagination: iconic gestural-based interaction},&lt;br /&gt;author={Marsh, T. and Watt, A.},&lt;br /&gt;journal={Virtual Reality Annual International Symposium, 1998. Proceedings., IEEE 1998},&lt;br /&gt;year={18-18 1998},&lt;br /&gt;volume={},&lt;br /&gt;number={},&lt;br /&gt;pages={122-125},&lt;br /&gt;keywords={computer graphics, graphical user interfaces3D computer generated graphical environments, 3D spatial information, human computer interaction, iconic gestural-based interaction, iconic hand gestures, object manipulation, shape manipulation, spatial information},&lt;br /&gt;doi={10.1109/VRAIS.1998.658465},&lt;br /&gt;ISSN={1}, }&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-1444806092307226163?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/1444806092307226163/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=1444806092307226163&amp;isPopup=true' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/1444806092307226163'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/1444806092307226163'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/02/marsh-shape-your-imagination.html' title='Marsh - Shape your imagination'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-233306672886484178</id><published>2008-02-13T15:11:00.002-06:00</published><updated>2008-02-13T15:33:35.877-06:00</updated><title type='text'>Kim - Gesture Rec. for Korean Sign Language</title><content type='html'>Jong-Sung Kim; Won Jang; Zeungnam Bien, "A dynamic gesture recognition system for the Korean sign language (KSL)," Systems, Man, and Cybernetics, Part B, IEEE Transactions on , vol.26, no.2, pp.354-359, Apr 1996&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Kim et al. present a system for recognizing a subset of gestures in KSL. They say KSL can be expressed with 31 distinct gestures, choosing 25 of them to use in this initial study. They use a Data-Glove, which gives 2 bend values for each finger, and a Polhemus tracker (normal 6 DOF) to get information about each hand. &lt;br /&gt;&lt;br /&gt;They recognize signs using the following recipe:&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;&lt;li&gt;Bin the movements along each axes (bins of width=4inches) to filter/smooth&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Vector quantize the movements of each hand into one of 10 "directional" classes that describe how the hand(s) is(are) moving.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Feed the glove sensor information into a fuzzy min-max neural network and classify which of 14 postures it is, using a rejection threshold just in case it's not any of the postures&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Use the direction and posture results to say what the sign is intended to be&lt;/li&gt;&lt;br /&gt;&lt;/ol&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;They make the remark that many of the classes are not linearly separable. This is a problem in many domains. Support vector machines can sometimes do a very good job at separating data. I wonder why no one has used them so far. Probably because they're fairly complex.&lt;br /&gt;&lt;br /&gt;I also like the idea of thinking as gestures as a signal. I don't know why, but this analogy has escaped me so far. There is a technique for detecting "interesting anomalies" in signals using PCA. I wonder if this would work in the segmentation problem?&lt;br /&gt;&lt;br /&gt;How do they determine the initial position for the glove coordinates? If they get it wrong, all their measurements will be off and the vector quantization of their movements will probably fail. They should probably just skip this whole initial starting point thing and use change from the last position. Maybe that's what they really mean, but it's unclear.&lt;br /&gt;&lt;br /&gt;Also, is seems like their method for filtering/smoothing the position/movement data by binning the values is a fairly hackish technique. There are robust methods for filtering noisy data that should have been used instead.&lt;br /&gt;&lt;br /&gt;And finally for their results. They say 85%, which doesn't seem /too/ bad for a first try. But then they try to rationalize that 85% is good enough, saying that "the deaf-mute who [sic] use gestures often misunderstand each other." Well that's a little condescending, now, isn't it? And they also blame everything else besides their own algorithm, and "find that abnormal motions in the gestures and postures, and errors of sensors are partly responsible for the observed mis-classification." So you want things to work perfectly for you? You want life to play fair? News flash: if things were perfect, you would be out of the job and there would be no meaning or reason for the paper you just wrote. Things are hard. Deal with it. Life's not perfect, my sensors won't give me perfect data, and I can't draw a perfectly straight line by hand. That's not an excuse to not excel in your algorithm and account for those imperfections.&lt;br /&gt;&lt;br /&gt;Also, how did they combine the movement quantized data (the ten movement classes) with the posture classifications? Postures were neural nets, not the combination, right? &lt;br /&gt;&lt;br /&gt;&lt;h3&gt;BibTeX&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;@ARTICLE{485888,&lt;br /&gt;title={A dynamic gesture recognition system for the Korean sign language (KSL)},&lt;br /&gt;author={Jong-Sung Kim and Won Jang and Zeungnam Bien},&lt;br /&gt;journal={Systems, Man, and Cybernetics, Part B, IEEE Transactions on},&lt;br /&gt;year={Apr 1996},&lt;br /&gt;volume={26},&lt;br /&gt;number={2},&lt;br /&gt;pages={354-359},&lt;br /&gt;keywords={data gloves, fuzzy neural nets, pattern recognitionKorean sign language, data-gloves, dynamic gesture recognition system, fuzzy min-max neural network, online pattern recognition},&lt;br /&gt;doi={10.1109/3477.485888},&lt;br /&gt;ISSN={1083-4419}, }&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-233306672886484178?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/233306672886484178/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=233306672886484178&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/233306672886484178'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/233306672886484178'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/02/kim-gesture-rec-for-korean-sign.html' title='Kim - Gesture Rec. for Korean Sign Language'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-7882566082927355048</id><published>2008-02-11T17:14:00.000-06:00</published><updated>2008-02-11T17:19:26.445-06:00</updated><title type='text'>Gesture descriptions for Trumpet Fingerings</title><content type='html'>I was going to do ASL fingerspelling as well, but others did it so I don't feel the need to repeat what they already said about it. Mine was basically exactly the same as theirs. So, instead, here's just the trumpet stuff.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://students.cs.tamu.edu/jbjohns/files/trumpetFingeringGestures.pdf"&gt;trumpetFingeringGestures.pdf&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-7882566082927355048?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/7882566082927355048/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=7882566082927355048&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/7882566082927355048'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/7882566082927355048'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/02/gesture-descriptions-for-trumpet.html' title='Gesture descriptions for Trumpet Fingerings'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-350090118574143573</id><published>2008-02-11T16:35:00.000-06:00</published><updated>2008-02-11T16:38:50.996-06:00</updated><title type='text'>Natural Language Descriptions for 5 easy ASL signs</title><content type='html'>Natural language descriptions of 5 common ASL signs&lt;br /&gt;Joshua Johnston&lt;br /&gt;Haptics&lt;br /&gt;11 February, 2008&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; text-decoration:underline"&gt;HELLO&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Put the right hand into the shape of a “B”--all four fingers extended and placed together vertically from the palm, with the thumb bent and crossing the palm. Raise the b-hand and put the tip of the forefinger to your right temple. Move the b-hand away from the head to the right with a quick gesture.&lt;br /&gt;&lt;br /&gt;(as if waving)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; text-decoration:underline"&gt;NAME&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Put both hands into the sign for the letter “U”--the fore- and middle finger of each hand straight and extended from the palm and touching, the remaining fingers and thumb curled together in front of the palm (like a “2” or “scissors”) with the fingers together). Bring the u-hands in front of the body, fingers pointing parallel to the ground, in the shape of an “X.” Tap the right hand's extended fingers on top of the left hand's extended fingers twice.&lt;br /&gt;&lt;br /&gt;(sign your hand on the “X”)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; text-decoration:underline"&gt;NICE&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Put all the fingers (and thumb) together and extended on both hands to form a flat surface. Put the left hand palm up in front of the body and hold it still. Take the right hand and put its palm to the palm of your left hand, then with a smooth motion slide the right hand down the fingers of and off the left hand.&lt;br /&gt;&lt;br /&gt;(this sign also means “clean” and “pure”, as if wiping the dirt off one hand with the other)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; text-decoration:underline"&gt;MEET&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Put both hands into the sign for the letter “D”--the forefinger extended vertically with the rest of the fingers and thumb curled in front of the palm. Bring the d-hands together in front of the body, touching the curled fingers together.&lt;br /&gt;&lt;br /&gt;(d-hands represent people coming together)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; text-decoration:underline"&gt;SANDWICH&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Put both hands together, palms flat and fingers/thumb extended and together. Bring the tips of the fingers up and touch them to your mouth.&lt;br /&gt;&lt;br /&gt;(hands are the bread, and you're eating it)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-350090118574143573?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/350090118574143573/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=350090118574143573&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/350090118574143573'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/350090118574143573'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/02/natural-language-descriptions-for-5.html' title='Natural Language Descriptions for 5 easy ASL signs'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-4919033612122347329</id><published>2008-02-11T15:48:00.000-06:00</published><updated>2008-02-11T15:58:37.434-06:00</updated><title type='text'>Cassandra - POMDPs</title><content type='html'>Anthony Cassandra. "A Survey of POMDP Applications." Presented at the AAAI Fall Symposium, 1998. http://pomdp.com/pomdp/papers/index.shtml, 11 Feb, 2008.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Not much to say about gesture recognition, which is not surprising since POMDPs are used for artificial intelligence in the area of planning. Think of a robot that has a goal and only a limited visual range (can't see behind obstructions, etc.). A POMDP might be used in this situation to evaluate different actions to take based on the current state of things.&lt;br /&gt;&lt;br /&gt;The paper does mention machine vision and gesture recognition. The context here is that the computer uses a POMDP to focus a camera and a fovea (high resolution area for fine-grained vision) on facial expressions, hand movements, etc. The fovea is important because it is limited, and the areas outside it either have a much lower resolution (to reduce computational burden) or cannot be seen at all (outside the FOV). &lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;I really don't think POMDPs can be used for our purposes in gesture recognition.&lt;br /&gt;&lt;br /&gt;However, this is a nice paper if you want examples of how POMDPs can be used in multiple domains.&lt;br /&gt;&lt;br /&gt;That is all.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;BibTeX&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;@UNPUBLISHED{cassandra1998pomdps&lt;br /&gt;,author={Anthony Cassandra}&lt;br /&gt;,title={A Survey of POMDP Applications}&lt;br /&gt;,year={1998}&lt;br /&gt;,note={Presented at the AAAI Fall Symposium}&lt;br /&gt;}&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-4919033612122347329?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/4919033612122347329/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=4919033612122347329&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/4919033612122347329'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/4919033612122347329'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/02/cassandra-pomdps.html' title='Cassandra - POMDPs'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-828528715284007375</id><published>2008-02-11T15:14:00.000-06:00</published><updated>2008-02-11T15:41:26.659-06:00</updated><title type='text'>Song - Forward Spotting Accumulative HMMs</title><content type='html'>Daehwan Kim; Daijin Kim, "An Intelligent Smart Home Control Using Body Gestures," Hybrid Information Technology, 2006. ICHIT'06. Vol 2. International Conference on , vol.2, no., pp.439-446, Nov. 2006&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Song and Kim present an algorithm for segmenting a stream of gestures and recognizing the segmented gestures. They take a sliding window of postures (observations) from the stream and feed them into a HMM system that has one model per gesture class, and one "Non-Gesture" HMM. They say that a gesture has started if the max probability from one of the gesture HMMs is greater than the probability of the Non-Gesture HMM. They call this the competitive differential observation probability, which is the difference between the max gesture prob and the non-gesture prob (positive means gesture, negative means non-gesture, and crossing 0 means starting/ending a gesture).&lt;br /&gt;&lt;br /&gt;One a gesture is observed to have started, they begin classifying the gesture segments (segmenting the segmented gesture). They feed the segments into the HMMs and get classifications for each segment. Once the gesture is found to have terminated (the CDOP drops below 0, or the gesture stream becomes a non-gesture), they look at the classification results for all the segments and take a majority vote to determine the class for the whole gesture.&lt;br /&gt;&lt;br /&gt;So we have a sliding window. Within that window, we decide a gesture starts and later see that it ends. Between the start and end points, we segment the gesture stream further. Say there are 3 segments. Then we'd classify {1}, {12}, and {123}. Pretend {1} and {123} were "OPEN CURTAINS" and {12} was "CLOSE CURTAINS." The majority vote, after the end of the gesture, would rule the gesture as "OPEN CURTAINS."&lt;br /&gt;&lt;br /&gt;They give some results, which seem to show their automatic method performs better than a manual method, but it's not clear what the manual method is. They seem to get about 95% accuracy classifying 8 gestures made with the arms to open/close curtains and turn on/off the lights.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;So basically they just use a probabilistic significance threshold to say if a gesture has started or not, as determined by the classification of an observation as a non-gesture (like Iba's use of a wait state when recognizing robot gestures). So don't call it the CDOP. Call it a "junk" class or "non-gesture" class. They made it much harder to understand than it is.&lt;br /&gt;&lt;br /&gt;When they give their results in Figure 5 and show the curves for manual segmentation, what the heck does \theta mean? This wasn't explained and makes their figure all but useless.&lt;br /&gt;&lt;br /&gt;So this seems like a decent method for segmenting gestures...10 years ago. Iba had almost the exact same thing in his robot gesture recognition system, and I'm sure he wasn't the first. Decent results, I think (can't really interpret their graph), but nothing really noteworthy.&lt;br /&gt;&lt;br /&gt;The only thing they do differently is do a majority vote from the sub-segmentation of their segmenting. Yeah, confusing. I'm not sure how much this improves recognition, as they did not compare with/without it. It seems to me like it would only take up more computation time for gains that weren't that significant.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;BibTeX&lt;/h3&gt;&lt;br /&gt;@ARTICLE{song2006forwardSpottingAccumulativeHMM,&lt;br /&gt;title={An Intelligent Smart Home Control Using Body Gestures},&lt;br /&gt;author={Daehwan Kim and Daijin Kim},&lt;br /&gt;journal={Hybrid Information Technology, 2006. ICHIT'06. Vol 2. International Conference on},&lt;br /&gt;year={Nov. 2006},&lt;br /&gt;volume={2},&lt;br /&gt;number={},&lt;br /&gt;pages={439-446},&lt;br /&gt;doi={10.1109/ICHIT.2006.253644},&lt;br /&gt;ISSN={}, }&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-828528715284007375?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/828528715284007375/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=828528715284007375&amp;isPopup=true' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/828528715284007375'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/828528715284007375'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/02/song-forward-spotting-accumulative-hmms.html' title='Song - Forward Spotting Accumulative HMMs'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-140513682949909964</id><published>2008-02-08T10:20:00.000-06:00</published><updated>2008-02-08T10:26:51.056-06:00</updated><title type='text'>Ip - Cyber Composer</title><content type='html'>Ip, H.H.S.; Law, K.C.K.; Kwong, B., "Cyber Composer: Hand Gesture-Driven Intelligent Music Composition and Generation," Multimedia Modelling Conference, 2005. MMM 2005. Proceedings of the 11th International , vol., no., pp. 46-52, 12-14 Jan. 2005&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Ip et al. describe their Cyber Composer system. The system uses rules of music theory and gesture recognition to allow users to create dynamic music with the use of hand gestures. The system allows for the control of tempo/rhythm, pitch, dynamics/volume, and even the use of a second instrument and harmony. With the help of various theory rules, including chord progression and harmonics, they assert their system can produce "arousing" musical pieces.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Not much is given in the way of technical details (well, nothing, actually) this is a good proof of concept. To me, the gestures seem intuitive, even if they are a little convoluted as the same type of gesture may do many things based on context. This would be a good class project, I think, with a little more gesture recognition and control over the final product. Maybe more like a real composer, where different instrument groups are located in space, and you can point at them and direct them to modify group dynamics. Who knows.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;BibTeX&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;@ARTICLE{ip2005cyberComposer,&lt;br /&gt;title={Cyber Composer: Hand Gesture-Driven Intelligent Music Composition and Generation},&lt;br /&gt;author={ Ip, H.H.S. and Law, K.C.K. and Kwong, B.},&lt;br /&gt;journal={Multimedia Modelling Conference, 2005. MMM 2005. Proceedings of the 11th International},&lt;br /&gt;year={12-14 Jan. 2005},&lt;br /&gt;volume={},&lt;br /&gt;number={},&lt;br /&gt;pages={ 46-52},&lt;br /&gt;doi={10.1109/MMMC.2005.32},&lt;br /&gt;ISSN={1550-5502 }, }&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-140513682949909964?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/140513682949909964/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=140513682949909964&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/140513682949909964'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/140513682949909964'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/02/ip-cyber-composer.html' title='Ip - Cyber Composer'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-6778530728789300827</id><published>2008-02-06T16:02:00.000-06:00</published><updated>2008-02-06T16:50:26.938-06:00</updated><title type='text'>Li - SImilarity Measure (SVD angular) for Stream Segmentation and Recognition</title><content type='html'>Li, C. and Prabhakaran, B. 2005. A similarity measure for motion stream segmentation and recognition. In Proceedings of the 6th international Workshop on Multimedia Data Mining: Mining integrated Media and Complex Data (Chicago, Illinois, August 21 - 21, 2005). MDM '05. ACM, New York, NY, 89-94. DOI= http://doi.acm.org/10.1145/1133890.1133901&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Li and Prabhakaran propose a new gesture classification algorithm that is easily generalizable to many input methods. They use the SVD of a motion matrix. A motion matrix has columns that are the features of the data (like the joint measurements from a CyberGlove) and rows that are steps through time. SVD is a mathematical procedure that produces a set of eigenvectors and eigenvalues for the matrix (they use matrix M = A'A, where A is the motion matrix, for computational efficiency). The top k eigenvectors are used (a parameter, with empirical evidence supporting k=6 as enough to perform well). To compare to motion matrices, the eigenvectors are compared with their dot product (angle between the vectors), weighted by the ratio of the eigenvalues for those vectors. A value of 0 means that the matrices have nothing in common, as all the eigenvectors are orthogonal. A value of 1 means the matrices have collinear eigenvectors. They call this kWAS, k weighted angular similarity (for k eigenvectors, and the weighted dot product/cosine metric).&lt;br /&gt;&lt;br /&gt;Their algorithm works as follows. Start with a library of matrices and compute the eigenvectors/values for them. Start watching the stream of incoming data, segmenting it with minimum length l and max length L, stepping through the stream with steps size \delta. Look at all the chunks in the stream, call the matrices Q, and compare them to all the P. The Q,P pairing that has the highest kWAS score is selected as the correct answer, and the classification starts from the end of the segment with the max score.&lt;br /&gt;&lt;br /&gt;They report that their algorithm can recognize CyberGlove gestures (not clear if it's  isolated patterns or streams) with 99% accuracy with k=3, and in motion capture data with 100% accuracy with k=4. These figures aren't clear as to what they mean, however.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;So their method isn't really for segmentation. They still just look at different sliding windows of data and pick one that works. It works well without the use of holding positions or neutral states, as many other systems impose on users to delineate gestures. However, Iba et al's system can do the same thing using hidden Markov models with a built in wait state.&lt;br /&gt;&lt;br /&gt;However, as far as a new classification approach is conerened, this is a nice approach because it seems to give decent results and is not another HMM. &lt;br /&gt;&lt;br /&gt;They never say how they pick delta. I wonder how different values affect accuracy / running time of the algorithm.&lt;br /&gt;&lt;br /&gt;Some people might be concerned with the fact that once you do the eigenvectors, you lose temporal information. I can see where this would be a concern for some things. However, most of the time you can get good classification/clustering results without the need for perfect temporal information. It can even be the case that temporal information tends to confuse the issue, making things hard to compute and compare.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;BibTeX&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;@inproceedings{1133901,&lt;br /&gt; author = {Chuanjun Li and B. Prabhakaran},&lt;br /&gt; title = {A similarity measure for motion stream segmentation and recognition},&lt;br /&gt; booktitle = {MDM '05: Proceedings of the 6th international workshop on Multimedia data mining},&lt;br /&gt; year = {2005},&lt;br /&gt; isbn = {1-59593-216-X},&lt;br /&gt; pages = {89--94},&lt;br /&gt; location = {Chicago, Illinois},&lt;br /&gt; doi = {http://doi.acm.org/10.1145/1133890.1133901},&lt;br /&gt; publisher = {ACM},&lt;br /&gt; address = {New York, NY, USA},&lt;br /&gt; }&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-6778530728789300827?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/6778530728789300827/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=6778530728789300827&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/6778530728789300827'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/6778530728789300827'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/02/li-similarity-measure-svd-angular-for.html' title='Li - SImilarity Measure (SVD angular) for Stream Segmentation and Recognition'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-98865750278750743</id><published>2008-02-06T14:12:00.000-06:00</published><updated>2008-02-06T15:05:53.145-06:00</updated><title type='text'>Hernandez-Rebollar - Accelerometers and Decision Tree for ASL</title><content type='html'>Hernandez-Rebollar, J.L.; Lindeman, R.W.; Kyriakopoulos, N., "A multi-class pattern recognition system for practical finger spelling translation," Multimodal Interfaces, 2002. Proceedings. Fourth IEEE International Conference on , vol., no., pp. 185-190, 2002&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Rebollar et al. present a new algorithm for classification of ASL finger spelling letters (J and Z, the only letters that move, are statically signed at the ending posture of the gesture). They create their own glove so they don't have to sink a lot of money in to the expensive options currently available. Their gloves uses 5 accelerometers, one per finger, that measure in two axes. The y axis is aligned to point at the tip of each finger, and measures flexion and pitch. The x axis gives an idea about roll, yaw, and abduction.&lt;br /&gt;&lt;br /&gt;They take the ten measurement values (two axes per finger, 5 fingers) and convert them to a 3D vector. The first dim is the sum of the x-axis values, the second is the y-axis, and the third is the y-axis value of the index finger, which they claim is adequate for describing the bentness of the palm. &lt;br /&gt;&lt;br /&gt;The 3D vector is fed into a decision tree. For 21/26 letters, 5 signers doing 10 reps of each letter, they get 100% accuracy. For the I and Y, they get 96%. For U,V, and R, the accuracy is 90%, 78%, and 96%.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Again, another paper where they sum all their values to get a global picture. This is a horrible idea as fingers will mask each other. At least sum the square of the values, so you can see if some are really high compared to others. Or, better, yet, use the 10 dimensions for the decision tree. It's really not that hard. &lt;br /&gt;&lt;br /&gt;It was nice to see something besides an HMM, and they do get pretty good results. However, I'm ready for J and Z to move.&lt;br /&gt;&lt;br /&gt;I also like their hardware approach. Seems simple and a lot less expensive than dropping 10-30K on a CyberGlove.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;BibTeX&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;@ARTICLE{rebollar2002multiClassFingerSpelling,&lt;br /&gt;title={A multi-class pattern recognition system for practical finger spelling translation},&lt;br /&gt;author={Hernandez-Rebollar, J.L. and Lindeman, R.W. and Kyriakopoulos, N.},&lt;br /&gt;journal={Multimodal Interfaces, 2002. Proceedings. Fourth IEEE International Conference on},&lt;br /&gt;year={2002},&lt;br /&gt;volume={},&lt;br /&gt;number={},&lt;br /&gt;pages={ 185-190},&lt;br /&gt;doi={10.1109/ICMI.2002.1166990},&lt;br /&gt;}&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-98865750278750743?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/98865750278750743/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=98865750278750743&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/98865750278750743'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/98865750278750743'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/02/hernandez-rebollar-accelerometers-and.html' title='Hernandez-Rebollar - Accelerometers and Decision Tree for ASL'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-1413176210769515383</id><published>2008-02-06T13:44:00.000-06:00</published><updated>2008-02-06T14:10:03.224-06:00</updated><title type='text'>Harling - Hand Tension for Segmentation</title><content type='html'>Philip A. Harling and Alistair D. N. Edwards. Hand tension as a gesture segmentation cue. In Philip A. Harling and Alistair D. N. Edwards, editors, Progress in Gestural Interaction: Proceedings of Gesture Workshop '96, pages 75--87, Springer, Berlin et al., 1997.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Harling and Edwards address the problem of segmenting gestures in a stream of data from a power glove. Their assumption is that when we purposefully want our hands to convey information, they will be tense. When the hand is "limp", the user is not trying to convey information. &lt;br /&gt;&lt;br /&gt;Tension is measured by imagining rubber bands attached to the tip of the finger, one parallel to the x axis and the other to the y-axis. The rubber bands have certain elastic moduli, and the tension in the system can be solved with physics equations. To get an idea of overall hand tension, the values for each finger are summed. &lt;br /&gt;&lt;br /&gt;They evaluate their idea by examining two different sayings in British sign language: "My name" and "My name me". They find dips in the 'tension graph' between each gesture, and claim an algorithm could segment at these points of low tension.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Seems pretty nice. It's good to have an idea of what we can do to solve the segmentation issue. However, I wonder if some gestures are performed with a "limp" hand. Their idea of tension is maximized when the finger is either fully extended or fully closed, so anything where the finger is halfway will not work. Also, perhaps you naturally stand with your hand clenched in your relaxed position, so non-gestures would be tense.&lt;br /&gt;&lt;br /&gt;I don't like that they sum the tension in each finger to get a total hand tension. I think we need information per finger, otherwise it seems like you could miss fingers moving in ways that kept the tension at the same level.&lt;br /&gt;&lt;br /&gt;Their testing was /not/ very thorough. Another poor results section.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;BibTeX&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;@inproceedings{harling1996handTensionSegmentation&lt;br /&gt;,author = "Philip A. Harling and Alistair D. N. Edwards"&lt;br /&gt;,title = "Hand Tension as a Gesture Segmentation Cue"&lt;br /&gt;,booktitle = "Gesture Workshop"&lt;br /&gt;,pages = "75-88"&lt;br /&gt;,year = "1996"&lt;br /&gt;}&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-1413176210769515383?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/1413176210769515383/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=1413176210769515383&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/1413176210769515383'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/1413176210769515383'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/02/harling-hand-tension-for-segmentation.html' title='Harling - Hand Tension for Segmentation'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-6656210616467388460</id><published>2008-02-06T11:02:00.000-06:00</published><updated>2008-02-07T07:32:38.862-06:00</updated><title type='text'>Lee - Interactive Learning HMMs</title><content type='html'>Lee, Christopher, and Yangsheng Xu. "Online, Interactive Learning of Gesture of Human/Robot Interfaces."&lt;br /&gt;&lt;br /&gt;Okay, right off the bat, this paper has nothing to do with robots. Why put it in the title?&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Lee and Xu present an algorithm for classifying gestures with HMMs, evaluating the confidence of each classification, and using correct classifications to update the parameters of the HMM. The user has to wait for a bit between gestures to aid in segmentation. To simplify the data, they use fast Fourier transforms (FFTs) on a sliding window of sensor data from the glove to collapse the window. They then feed the FFT results to vector quantization (using an off-line codebook generated with LBG) to collapse the vector to a one dimension symbol. The series of symbols are fed into HMMs, one per class, and the class with the highest Pr(O|model) is selected as answer. The gesture is then folded into the training set for that HMM and the parameters are updated.&lt;br /&gt;&lt;br /&gt;They also introduce a confidence measure for analyzing their system's performance, which is the log of the sum of the all ratios of an incorrect HMMs prob for a gesture / the corrent HMMs prob for a gesture. If a gesture is classified correctly, the correct HMM will have a higher prob than all the incorrect HMMs and all the ratios will be &lt; 1, meaning the log of the sum of them will be &lt; 0. If all the probabilites are about the same, the classifier is unsure and the ratios will all be around 1, meaning the log will be around 0. They show that starting with one training example, they achieve high and confident classification after only a few testing examples are classified and used to update the system.&lt;br /&gt;&lt;br /&gt;However, they're only using a set of ASL letters that are "amenable to VQ clustering."&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;I do like the idea of online training an updating of the model. However, after a few users, you lose the benefit so it's just better to have a good set of training data that's used offline before any recognition takes place, simplifying your system and reducing workload.&lt;br /&gt;&lt;br /&gt;I don't like that you have to hold your hand still for a little bit between gestures. I would have liked to seen a system like the "wait state" HMM system discussed in Iba, et al. "An architecture for gesture based control of mobile robots." I'd like to see a better handle on the segmentation problem. They do mention using acceleration.&lt;br /&gt;&lt;br /&gt;Their training set is too small and easy, picking things that are "amenable to VQ clustering", so I don't give their system much credit.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-6656210616467388460?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/6656210616467388460/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=6656210616467388460&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/6656210616467388460'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/6656210616467388460'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/02/lee-interactive-learning-hmms.html' title='Lee - Interactive Learning HMMs'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-5503608096184400699</id><published>2008-02-04T10:22:00.000-06:00</published><updated>2008-02-06T11:02:27.395-06:00</updated><title type='text'>Chen - Dynamic Gesture Interface w/ HMMs</title><content type='html'>Chen, Qing.... "A Dynamic Gesture Interface for Virtual Environments Based on Hiddean Markov Models." HAVE 2005&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Chen et al. use hidden Markov models (HMMs) to classify gestures (their focus is a simple domain of three gestures). The algorithm they use captures the standard deviation of the different bend sensor values on a glove, with the argument/idea that using the std., they don't have to worry about segmentation. They feed the std data into HMMs and classify like that. Their three gestures are very simple and are used to control three axes of rotation for a virtual, 3D cube.&lt;br /&gt;&lt;br /&gt;They give no recognition results.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;I'm not sure these guys are too well versed in machine learning. This paper is pretty weak. I'll just make a laundry list instead of trying to tie all my complaints together in prose.&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;They mention other approaches (Kalman filters, dynamic time warping, FSM) that have been used, but state they have "very strict assumptions." Okay, like what? Kalman filters and hidden Markov models pretty much do the exact same thing, so why will HMMs do better than Kalman filters?&lt;/li&gt;&lt;br /&gt;&lt;li&gt;They say (page 2, first par.) that gestures are noisy and even if a person does it the same way, it will still be different. Duh. Too bad. Measurements and data are noisy, just like everything in machine learning. Otherwise, you'd just look it up in a hash table and save yourself a lot of trouble.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;It's the Expectation-&lt;b&gt;&lt;u&gt;MAXIMIZATION&lt;/u&gt;&lt;/b&gt; algorithm, not -Modification.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;They claim to avoid the need for segmentation. Okay, then what are you computing the standard deviation of? You have to have some sort of window of points to do the calculations on. I suppose their assumption is they just get the gesture in a window, not half of one, and things happen by magic.&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;Weak paper. Do not want. Would not buy from seller again.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-5503608096184400699?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/5503608096184400699/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=5503608096184400699&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/5503608096184400699'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/5503608096184400699'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/02/chen-dynamic-gesture-interface-w-hmms.html' title='Chen - Dynamic Gesture Interface w/ HMMs'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-7183417096284560268</id><published>2008-01-30T13:37:00.000-06:00</published><updated>2008-01-30T14:21:40.913-06:00</updated><title type='text'>Iba - Robots</title><content type='html'>Iba, Soshi, J. Michael Vande Weghe, Christiaan J. J. Paredis, and Pradeep K. Khosla. "An Architecture for Gesture-Based Control of Mobile Robots." Intelligent Robots and Systems, 1999.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Iba et al. describe a system that uses gestures collected from a CyberGlove (for finger/hand position) and Polhemus (hand tracking in six degrees of freedom) and recognized using a hidden Markov model to control a robot. Their argument for using a glove-based interface is that it can be a more intuitive method for controlling robot movement, etc. Not necessarily for one robot, as a joystick can work with a higher degree of accuracy. Their primary claim is that for groups of robots, where controlling each individual robot becomes intractable and burdensome, are easily controlled as a group using gestures such as pointing and general motion commands. The commands were open, flat hand to continue motion, pointing to 'go there', wave left or right to turn that direction, and closed fist to stop. &lt;br /&gt;&lt;br /&gt;Their hardware samples finger and wrist position and flexion at a rate of 30 HZ. The data gathered is sent to a preprocessor, with the 18 data points offered by the glove undergoing linear combinations to 10 values and then augmented with the first derivatives (change from the last point in time). The 20 dimensional vectors are vector quantized into a codeword. The codeword represents a coarse-grained view of the position/movement of the fingers/hand, with the codewords trained off-line. 'Postures', then, become codewords, and gestures are sequences of codewords. &lt;br /&gt;&lt;br /&gt;The last &lt;i&gt;n&lt;/i&gt; codewords are fed into an HMM, which contains a method for rejecting (a 'wait' state branching to the HMM for each gesture), and the gesture is classified to the HMM that gives the highest probability (forward algorithm).  &lt;br /&gt;&lt;br /&gt;They test their algorithm with an HMM both with and without the wait state to show that the wait state helps to reject false positives, which is of concern because you don't want the robot to move if you don't mean it to. Whereas for a false negative, the gesture can simply be repeated. With the wait state, they got 96% true positives, with only 1.6/1000 false positives. Without the wait state, they got 100% true positives but 20/1000 false positives.  &lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;How did they come up with the best linear combination to use when reducing the glove data from 18 values to 10? &lt;br /&gt;&lt;br /&gt;I would like to see details on how they created their codebook. They say they covered 5000 measurements that are representative of the feature space, but the feature space in this case is huge! Say each of the 18 sensors have 255 values each. The 6 DOF of the hand tracker are three angular measurements, with 360 values (assuming integral precision), and three real valued dimension measurements. Say the tracker is accurate to the inch, and the length of the cord is ten feet. Let's make it easier for them and say you can only stand in front of the tracker, and not behind, so that's a ten foot half sphere with volume (1/2)*(4/3)*PI*(10^3) = 4/6 * 3 * 1000 = 2000 cubic feet. Let's cut that in half because some of the sphere is inaccessible (goes into the floor or above the ceiling), so 1000 square feet, or 1000 * 12^3 = 1.728e6 square inches. So the number of possible values for the entire space of possible values coming from the hardware is (18*255)*(3*360)*(3*1.728e6). My math is probably off, and so are the assumptions of the values of the ranges, but even if I'm off by 3 orders of magnitude, that's still a WHOLE FRIGGIN LOT MORE THAN 5000 POSITIONS. Now, how did they 'cover the entire space' adequately? Maybe they did, I don't know, but I'm skeptical. I suppose my beef is with their claim that they cover the &lt;b&gt;ENTIRE SPACE&lt;/b&gt;. I doubt it.  &lt;br /&gt;&lt;br /&gt;Something like multi-dimensional scaling might tell you which features are important. Or you could use an iterative, interactive process for creating new codewords. Something like starting with their initial book, and then for each quantized vector (or a random sampling), seeing if it is 'close enough' to the others, or fits into a cluster with 'high enough' probability (if your codeword clusters were described by mixtures of Gaussians, the codeword being the means). If it's not good enough, start a new cluster. Maybe they did something like this, but they didn't say. &lt;br /&gt;&lt;br /&gt;So aside from those two long, preachy paragraphs, I really liked this algorithm. Quantizing the codewords means your HMM only has to deal with a certain number (32) of different inputs, making them discretized and easier to train, and you know exactly what to expect. &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;BiBTeX&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;@ARTICLE{iba1999gestureControlledRobots,&lt;br /&gt;title={An architecture for gesture-based control of mobile robots},&lt;br /&gt;author={Iba, S.; Weghe, J.M.V.; Paredis, C.J.J.; Khosla, P.K.},&lt;br /&gt;journal={Intelligent Robots and Systems, 1999. IROS '99. Proceedings. 1999 IEEE/RSJ International Conference on},&lt;br /&gt;year={1999},&lt;br /&gt;volume={2},&lt;br /&gt;number={},&lt;br /&gt;pages={851-857 vol.2},&lt;br /&gt;keywords={data gloves, gesture recognition, hidden Markov models, mobile robots, user interfacesHMM, data glove, gesture-based control, global control, hand gestures, hidden Markov models, local control, mobile robots, wait state},&lt;br /&gt;doi={10.1109/IROS.1999.812786},&lt;br /&gt;ISSN={}, }&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-7183417096284560268?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/7183417096284560268/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=7183417096284560268&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/7183417096284560268'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/7183417096284560268'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/01/iba-soshi-j.html' title='Iba - Robots'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-396998331978257306</id><published>2008-01-30T13:35:00.000-06:00</published><updated>2008-01-30T14:12:34.423-06:00</updated><title type='text'>Deering - HoloSketch</title><content type='html'>Deering, Michael F. "HoloSketch: A Virtual Reality Sketching / Animation Tool." ACM Transactions onf Computer Human Interaction (2.3), 1995: 220-38. &lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Deering describes a system that uses a 3D mouse, stereo CRT, and head/eye tracking to create a system capable of drawing in three dimensions and being able to view the objects in three dimensions by just moving the head around. The 3D mouse has a digitizer rod attached to it, acting as a wand that is used to draw and manipulate in 3D. Different button/keyboard combinations can be used to change the modality of the drawing program. The user can pull up a 3D context menu to perform different drawing and editing actions, including the drawing of many 3D primitives, drawing operations like coloring, moving, selecting, resizing, and even setting up animations. Their system is accurate enough in its 3D rendering that a physical ruler can be held to the projected image and be accurate.  &lt;br /&gt;&lt;br /&gt;There are no algorithms or true implementation details presented in this paper, so I don't feel the need to do much summarization. You draw in 3D with a 3D mouse with a 'wand' poking out of it, much like you would in any 2D paint program. You look at the object in true 3D thanks to stereoscopic display and head/eye tracking. &lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;I was fairly impressed with this, especially the accuracy in 3D rendering that was obtained. I'd like to see what could be done with this now with modern hardware. Especially with a truly wireless pen, for unconstrained 3D movement. Or even different pens for doing different things, so as to reduce button clutter and complexity. I think this could be a super killer app, especially with sketch recognition capabilities! Or turning the stereo CRT (they have stereo LCD, btw) into wearable glasses for more of a HUD approach--augmented reality. &lt;br /&gt;&lt;br /&gt;I bet Josh P. drooled over this paper when he read it. But other than that, since there isn't an algorithm or anything besides interface information, I don't think I have much more to say that's really that useful. &lt;br /&gt;&lt;br /&gt;&lt;h3&gt;BiBTeX&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;@article{deering1995holosketch,&lt;br /&gt; author = {Michael F. Deering},&lt;br /&gt; title = {HoloSketch: a virtual reality sketching/animation tool},&lt;br /&gt; journal = {ACM Transactions on Computer-Human Interaction},&lt;br /&gt; volume = {2},&lt;br /&gt; number = {3},&lt;br /&gt; year = {1995},&lt;br /&gt; issn = {1073-0516},&lt;br /&gt; pages = {220--238},&lt;br /&gt; doi = {http://doi.acm.org/10.1145/210079.210087},&lt;br /&gt; publisher = {ACM},&lt;br /&gt; address = {New York, NY, USA},&lt;br /&gt; }&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-396998331978257306?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/396998331978257306/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=396998331978257306&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/396998331978257306'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/396998331978257306'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/01/deering-holosketch.html' title='Deering - HoloSketch'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-7634226120111529178</id><published>2008-01-29T15:58:00.000-06:00</published><updated>2008-01-30T13:35:10.834-06:00</updated><title type='text'>Rabiner and Juang -- Tutorial on HMMs</title><content type='html'>Rabiner, L.; Juang, B., "An introduction to hidden Markov models," ASSP Magazine, IEEE [see also IEEE Signal Processing Magazine] , vol.3, no.1, pp. 4-16, Jan 1986&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Rabiner and Huang give an overview of hidden Markov models and some of the things you can do with them. &lt;br /&gt;&lt;br /&gt;HMMs are good for representing temporal sequences of data. The Markov property says that a current property of the system (such as an observation made on it), is affected by the previous observation. They work by holding information about a set of states, with transitions available between the states with different probabilities. Each state has a distribution of outputs. So if you were to use an HMM to generate a sequence of outputs (which is not how you use them, and doing something like this is a bad idea), you'd take a random walk at an initial state (chosen by the prior probabilities \pi of the HMM model). At the state you'd choose an output based on the state's output distribution, and then transition to a new state based on the transition probs from that state. Recurse until you output the desired number of symbols.&lt;br /&gt;&lt;br /&gt;Some neat things to do with hidden Markov models:&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;&lt;li&gt;Given an model and observation sequence, compute the probability of that sequence occurring from that model. &lt;b&gt;Forward or Backward Algorithms&lt;/b&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Given a model and observation sequence, compute the sequence of states through the model that has the highest probability of producing the output (optimal path). &lt;b&gt;Viterbi Algorithm&lt;/b&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Given a set of observations, determine the parameters of the model with maximal likelihood. &lt;b&gt;Baum-Welch Algorithm&lt;/b&gt;&lt;/li&gt;&lt;br /&gt;&lt;/ol&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Hidden Markov models are the gold standard for many machine learning classification tasks, including handwriting and speech recognition. While they have many potential powerful uses, they're still not a silver bullet for all tasks, especially if used incorrectly.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;BibTeX&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;@ARTICLE{rabiner1986introHMMs&lt;br /&gt;,title={An introduction to hidden {Markov} models}&lt;br /&gt;,author={L. R. Rabiner and B. H. Juang}&lt;br /&gt;,journal={IEEE ASSP Magazine}&lt;br /&gt;,year={1986}&lt;br /&gt;,month={Jan}&lt;br /&gt;,volume={3}&lt;br /&gt;,number={1}&lt;br /&gt;,pages={4-16}&lt;br /&gt;,ISSN={0740-7467}&lt;br /&gt;}&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-7634226120111529178?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/7634226120111529178/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=7634226120111529178&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/7634226120111529178'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/7634226120111529178'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/01/rabiner-and-huang-tutorial-on-hmms.html' title='Rabiner and Juang -- Tutorial on HMMs'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-9114667139105600773</id><published>2008-01-23T16:21:00.000-06:00</published><updated>2008-01-23T16:49:30.568-06:00</updated><title type='text'>Allen et al -- ASL Finger Spelling</title><content type='html'>Allen, J.M.; Asselin, P.K.; Foulds, R., "American Sign Language finger spelling recognition system," Bioengineering Conference, 2003 IEEE 29th Annual, Proceedings of , vol., no., pp. 285-286, 22-23 March 2003&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Allen et al. want to create a wearable computer system that is capable of translating ASL finger spelling used by the deaf into both written and spoken forms. Their intention is to lower communication barriers between deaf and hearing members of the community. &lt;br /&gt;&lt;br /&gt;Their system consists of a CyberGlove worn by the finger speller. The glove uses sensors to detect bending in the fingers, palm, finger and wrist abduction, thumb crossover, etc. These glove is polled at a controlled sampling rate. The vector of sensor values is fed into a perceptron neural network that has been trained with examples of wach of the 24 different letters ('J' and 'Z' require hand motion, so were left out of this study). The classification output is given by the neural network, and is the right letter 90% of the time. Their experiments were only based on one user, however. &lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;First, the authors of this paper are very condescending toward the Deaf community. If any Deaf people were to ever read this article, they would be seriously pissed. Obviously I'm not Deaf. Obviously I can't speak for all Deaf people. &lt;/disclaimer&gt; That being said, the Deaf community is very strong (I capitalize Deaf on purpose, as that's the way Deaf culture sees itself). They work hard to make themselves independent, not needing the help or assistance of the hearing. The motivation for the paper is sound, and technology like this would indeed lower some of the communication barrier.&lt;br /&gt;&lt;br /&gt;This doesn't seem too bad as a proof of concept. Motion needs to be incorporated to get the 'J' and 'Z' characters into play. This system also needs to be ***fast*** as the Deaf can finger spell incredibly rapidly, as quickly as you can spell a word verbally. Natural finger spelling is not about making every letter distinct, but about capturing the "shape" of the word (same way your brain works when it reads words on a page, remember that Cambridge study thing? http://www.jtnimoy.net/itp/cambscramb/). How distinct do the letters have to be for their approach to work? What sampling rate do they use? Can it be done real time (I guess no, since they say MATLAB stinks at real time).&lt;br /&gt;&lt;br /&gt;Also, I would like to see results on misclassifications. Which letters do poorly (m and n look alike, so do a, e, and s)? They also point out accuracy is user specific. Finger spelling is a set form, so surely there are ways to generalize recognition. Just train on more than one person. Neural nets could also be used to train the 'in-between' stuff and give a little context for the letters before and after a transition.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;BiBTeX&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;@inproceedings{allen2003aslFingerSpelling&lt;br /&gt;,author={Jerome M. Allen and Pierre K. Asselin and Richard Foulds}&lt;br /&gt;,title={{American Sign Language} finger spelling recognition system}&lt;br /&gt;,booktitle={29th Annual IEEE Bioengineering Conference, 2003}&lt;br /&gt;,year={2003}&lt;br /&gt;,month={March}&lt;br /&gt;,pages={285-286}&lt;br /&gt;,doi={10.1109/NEBC.2003.1216106}&lt;br /&gt;}&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-9114667139105600773?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/9114667139105600773/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=9114667139105600773&amp;isPopup=true' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/9114667139105600773'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/9114667139105600773'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/01/allen-et-al-asl-finger-spelling.html' title='Allen et al -- ASL Finger Spelling'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-5233346385351823009</id><published>2008-01-22T11:07:00.000-06:00</published><updated>2008-01-22T11:50:24.822-06:00</updated><title type='text'>Krueger -- Environmental Technology</title><content type='html'>Krueger, Myron W. "Environmental technology: making the real world virtual". Communications of the ACM (36.7), July 1993: pp. 36-37.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Not much to this one. It's filler for a glossy magazine. The interesting points are the use of gesture and hand input for virtual environments. VIDEOPLACE and VIDEODESK use image capture (cameras) to track movement in the environment and allow for the interaction with a virtual world, as well as collaboration with others. &lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Not really anything to say here.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;BiBTeX&lt;/h3&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;@article{krueger1993environmentalTechnology&lt;br /&gt;,author = {Myron W. Krueger}&lt;br /&gt;,title = {Environmental technology: making the real world virtual}&lt;br /&gt;,journal = {Communications of the ACM}&lt;br /&gt;,volume = {36}&lt;br /&gt;,number = {7}&lt;br /&gt;,year = {1993}&lt;br /&gt;,issn = {0001-0782}&lt;br /&gt;,pages = {36--37}&lt;br /&gt;,doi = {http://doi.acm.org/10.1145/159544.159563}&lt;br /&gt;,publisher = {ACM}&lt;br /&gt;,address = {New York, NY, USA&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-5233346385351823009?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/5233346385351823009/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=5233346385351823009&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/5233346385351823009'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/5233346385351823009'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/01/krueger-environmental-technology.html' title='Krueger -- Environmental Technology'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-6131657498120170348</id><published>2008-01-22T10:16:00.000-06:00</published><updated>2008-01-22T11:51:24.384-06:00</updated><title type='text'>Deller et al -- Flexible Gesture Recognition</title><content type='html'>Deller, Matthias, et al. "Flexible Gesture Recognition for Immersive Virtual Environments." In Proceedings of the Tenth International Conference on Information Visualization, 2006 (IV'2006), pp. 563-568, July 2006.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Deller et al. seek to create a flexible gesture recognition engine for glove-based (or other hand tracking) interface. There goal is accurate gesture recognition in an immersive 3D environment, where the user would be able to naturally utilize their hands with minimal "cognitive load" to distract them. They seek an engine that works regardless of the environment the gloves/hand tracking system is deployed in or what kind of hardware is used. They list several current approaches to hand tracking and gesture recognition, most of which they cite as needing expensive hardware or fancy image processing techniques (if cameras are involved).&lt;br /&gt;&lt;br /&gt;Their approach is to abstract the gesture recognition into a higher level (gasp, the use of basic programming paradigms!). Regardless of how the data is captured (gloves or image processing), it is treated as a sequence of postures. A posture is the position of fingers/orientation of the hand that is held for a certain amount of time (the glove is constantly polled). A sequence of postures defines a gesture. Postures are performed and give a template during training. Many examples of the posture can be performed, even per user, and gives an 'average' template. These templates form a posture library.&lt;br /&gt;&lt;br /&gt;When the system sees a posture (orientation of fingers and glove for certain amount of time), it preprocesses by filtering and smoothing the data to reduce the amount of noise (especially in hand orientation data, which the hardware they used was bad at determining). Smoothed data was sent to the recognizer, which compares the posture(s) to every template in the library, using a distance metric from the bend-vector (values of five finger bend sensors), flagging as a candidate those whose distance is below a threshold. Then orientation data is used to weight the match, if orientation is important for a posture. Sequences of postures make a gesture. &lt;br /&gt;&lt;br /&gt;Tested empirically in an environment, but no hard results. :(&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discusssion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;So it's the first paper of the semester and already we see the phrase "the most natural way." I immediately sensed red flags. But I think I might agree here that the  most natural way to interact with your environment is through touch. Our brains our geared, after all, to tactile dexterity. We have opposable thumbs, and our use of tools is supposedly what makes us different from the monkeys and sea-horses, ad nauseum, ad infinitum. So lower the red flags. Hands are good. Pens? Maybe not, but that's a debatable issue.&lt;br /&gt;&lt;br /&gt;Distance metrics make me uneasy, especially when you start throwing around averages and thresholds. I think this method is a good candidate for using Gaussian distributions to model the positions of the five fingers. Since you're providing multiple examples of each posture, just keep track of the average bend for each finger and the covariance. This provides a probability for matching with the template library and seems a little more robust than distances. You may also be able to integrate orientation into the same vector as the bend sensors with this approach, just as an extra dimension. For values where orientation is not important, set the std -&gt; inifnity, so any variation does not affect the probability (or marginally). &lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;Cognitive burden: holding the gesture for 300-600 ms. Is there a study on this? Would like to see some results. Seems user dependent, especially if the user is a "power-user" or "n00blet."&lt;/li&gt;&lt;br /&gt;&lt;br /&gt;&lt;li&gt;What happens when more than one posture is below the distance matching threshold? Do they just pick the lowest distance?&lt;/li&gt;&lt;br /&gt;&lt;br /&gt;&lt;li&gt;In section 5 they mention the "normal consumer grade computer". Granted, you can get a quad core, 4 GiB RAM, 256 MiB graphics card rig from Dell for $1500. But "normal consumer grade" is probably closer to the $300 Acer/e-Machines mom and dad buy you from Wal-Mart. Specifications would be nice for their target machine.&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;BiBTeX&lt;/h3&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;@inproceedings{deller2006flexibleGesture&lt;br /&gt;,author={Matthias Deller and Achim Ebert and Michael Bender and Hans &lt;br /&gt;,title={Flexible Gesture Recognition for Immersive Virtual Environments}&lt;br /&gt;,booktitle={Tenth International Conference on Information Visualization, 2006 (IV'06)}&lt;br /&gt;,year={2006}&lt;br /&gt;,month={July}&lt;br /&gt;,pages={563-568}&lt;br /&gt;,doi={10.1109/IV.2006.55}&lt;br /&gt;,ISSN={1550-6037},&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-6131657498120170348?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/6131657498120170348/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=6131657498120170348&amp;isPopup=true' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/6131657498120170348'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/6131657498120170348'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2008/01/deller-et-al-flexible-gesture.html' title='Deller et al -- Flexible Gesture Recognition'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-3725835631342013646</id><published>2007-12-14T11:19:00.000-06:00</published><updated>2007-12-14T11:32:29.817-06:00</updated><title type='text'>Handwriting in LADDER</title><content type='html'>&lt;a href="http://cs.ecs.baylor.edu/~johnstoj/jbjohns_ladderHandwriting/jbjohns_ladderHandwriting.html"&gt;FLV video/demo&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://docs.google.com/Presentation?id=dczgw4z9_6drzsnhc4"&gt;Presentation&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;All stuff Copyright 14 Dec. 2007, Joshua Johnston.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-3725835631342013646?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/3725835631342013646/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=3725835631342013646&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/3725835631342013646'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/3725835631342013646'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2007/12/handwriting-in-ladder.html' title='Handwriting in LADDER'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-6357045503488579729</id><published>2007-12-13T08:24:00.000-06:00</published><updated>2007-12-13T08:52:42.726-06:00</updated><title type='text'>Alvarado and Lazzareschi -- Properties of Diagrams</title><content type='html'>Alvarado, Christine, and Michael Lazzareschi. "Properties of Real-World Digital Logic Diagrams." PLT 2007.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Alvarado and Lazzareschi examine unconstrained sketches of digital logic diagrams that were created by students in a real-world setting: the class room. 13 students in a digital logic class were given Tablet PCs, and all notes and lab work, etc., was completed through sketching on the tablet. The authors took the first portion of the class's notes (the portion of the class dealing with low-level diagramming) and extracted 98 individual diagrams. The diagrams were then all labeled by hand to identify symbols. Analysis was performed on the labeled diagrams to determine the impact of stroke order, stroke timing, and number of strokes that students use to draw real sketches. This analysis sheds light on many of the underlying assumptions that sketch recognition algorithms make to boost their recognition rates.&lt;br /&gt;&lt;br /&gt;The authors examined if stroke ordering could be used to help delineate different objects in the diagrams by seeing if students would complete a shape completely before moving on to start stroking in different locations. This was shown to be a false assumption almost 20% of the time. That is, of all the symbols in the diagram, 20% of them were drawn with nonconsecutive strokes. While many of these strokes seemed to be overtracing or touch-up, this is still a significant amount.&lt;br /&gt;&lt;br /&gt;The authors next looked at the timing of the strokes from one object to another, seeing if different shapes in the diagram could be delineated using a certain threshold on pause time. The authors did find that, on average, users made shorter pauses between strokes in the same shape, and made longer pauses between shapes. This  difference is significant using the Wilcoxon rank sum. However, there is a great deal of overlap in the amount of time user's paused. The authors looked at each user, determining the "optimal" amount of pause time to delineate strokes, and computed the recognition error rates that would result if timing information was the only basis for recognition (determining if this is a new symbol). At a minimum, 10% error would occur, at max 35% would occur, and on average 20% would occur. This is a significant amount of error.&lt;br /&gt;&lt;br /&gt;Last, the authors looked at the numbers of strokes used to draw each symbol. The authors found that not only did the number of strokes per symbol vary from student to student and from symbol to symbol, but also that variations occurred within the same user drawing the same symbol. Moreover, some users were more consistent, while others showed more variation. However, the authors did conclude that for this domain (circuit logic diagrams), the overwhelming majority of all strokes were only used to draw symbols. This lends some support to the assumption that single strokes are not used to draw more than one symbol. However, the authors believe this is domain specific.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;I really liked this paper, as well as the other HMU paper about interface design issues. They have both shed a lot of light about what must be done for sketch recognition to be a viable player in the education community. I limit to this particular community because I wonder if some of the same problems would be found in a professional setting.&lt;br /&gt;&lt;br /&gt;For instance, the authors talk about students returning to symbols for touch up or to overtrace, possibly while thinking about what to draw next. It seems like a professional would not have these tendencies, that they would be more inclined to "get it right the first time." No to bash on students, but they are, after all, students... and learning these things for the first time. Of course they aren't going to process things as quickly as a professional.&lt;br /&gt;&lt;br /&gt;Also, I don't think some of the other data, such as using a single stroke to draw multiple symbols, or the number of strokes used to create symbols, would hold with professionals. First, I think that pros would have a very set and familiar way to draw symbols, so there would be little variation within different instances of their own symbols. Second, I think that with expertise comes a standard way of drawing things even across pros, or at least everyone gets more comfortable and the number of symbols tends to decrease. Also, I think the more experienced and comfortable you are with laying your diagram out "by the seat of your pants" with only a mental image in your head, you would start to do more and more things using single strokes from one shape to the next, only lifting your pen when you had to.&lt;br /&gt;&lt;br /&gt;I also think the timings for intra- and inter-shape strokes would be closer together, as a pro does not have as much need to pause from one shape to the other. They know these things and can do them on the fly with little error.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-6357045503488579729?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/6357045503488579729/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=6357045503488579729&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/6357045503488579729'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/6357045503488579729'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2007/12/alvarado-and-lazzareschi-properties-of.html' title='Alvarado and Lazzareschi -- Properties of Diagrams'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-7695081957834716803</id><published>2007-12-13T00:10:00.000-06:00</published><updated>2007-12-13T01:53:45.743-06:00</updated><title type='text'>Wais et al. -- Perception of Interface Elements</title><content type='html'>Wais, Paul, Aaron Wolin, and Christine Alvarado. "Designing a Sketch Recognition Front-End: User Perception of Interface Elements." Eurographics 2007.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Wais, Wolin, and Alvarado perform a Wizard of Oz study to examine several interface aspects of a sketch-based circuit diagram application. They wanted to examine three aspects of sketch recognition systems:&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;&lt;li&gt;When to perform recognition and and how to provide feedback about the results&lt;/li&gt;&lt;br /&gt;&lt;li&gt;How the user can indicate which parts of the sketch she wants recognized&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Measure how recognition errors impact usability&lt;/li&gt;&lt;br /&gt;&lt;/ol&gt;&lt;br /&gt;&lt;br /&gt;The authors experimented with nine subjects, asking them to draw circuit diagrams with a pen-based input device (subjects had experience with the domain and devices both). Three sets of experiments tested the above aspects. For the first aspect, the system either used a button press, a check-tap gesture, or a delay of 4 seconds as a trigger for starting stroke recognition, with users preferring the button as it gave them reliable control (the gesture was difficult to recognize, 1/6 times worked). They preferred to use the button to recognize large chunks all at once. When recognition occurred, most users preferred the stroke lines to change color based on recognition status (recognized or not) rather than text labels to appear, as the labels cluttered the drawing. However, most users still hovered over the "recognized" portions (no actual recognition took place) to check to see if the label was correct. The system also introduced different types of random errors into the sketch. Errors themselves didn't seem to bother users too much, as long as they are predictable and lend themselves to learning and correction. Random errors did not provide this ability and were frustrating to users. Also, users want separate spaces, or separate colors, to determine what should be recognized in a sketch and what should be left alone.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;This was a neat paper. It was nice to see an interface paper saying what people actually liked and what works in a sketch recognition application. I'm not an interfaces person, so this was really the first time I had seen this addressed. &lt;br /&gt;&lt;br /&gt;Regarding their pause trigger for recognition: it seems like this might be something that can be learned. Say you take the mean amount of time between the user's last pen stroke and when they press the 'recognize' button. This would capture their average amount of time. Of course, I think using a pause is a bad idea in general. Users expressed the desire for control and predictability. Give it to them rather than having a magical timeout that they neither see nor control.&lt;br /&gt;&lt;br /&gt;I would have liked to have seen some interface issues concerning ways of correcting errors. Their method is just for the user to erase the problematic stroke and make it again. Obviously they're just using a dummy application for their Wizard of Oz approach, but I think it would be nice to have a drop down for an n-best list, one of the options being "None of the Above, Plan 9." This magical option would return the strokes to their original form and allow the users to group with a lasso or something the things that should be recognized. This seems like a burden to the user, especially when the push is for things that are fully automatic and uber-accurate. Well, maybe that's not realistic, at least not yet. Even if it is uber-accurate, it will still make mistakes. If the user can help it fix those mistakes, it can possibly learn. Or, at least make it easier for the user to clean up your program's boo boo.&lt;br /&gt;&lt;br /&gt;Also, I've nearly decided that gestures are horrible. We already have an application with imperfect sketch recognition, and we're throwing in mega-imperfect gestures! I wonder if anyone knows if there is a good gesture toolkit out there, and I'm talking like 99.99%, none of this 1/6 (ONE CORRECT TO FIVE INCORRECT, 16%) laaaaaame product.  But this isn't their fault, it's the gesture toolkit's fault.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-7695081957834716803?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/7695081957834716803/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=7695081957834716803&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/7695081957834716803'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/7695081957834716803'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2007/12/wais-et-al-perception-of-interface.html' title='Wais et al. -- Perception of Interface Elements'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-678176790260777559</id><published>2007-12-12T15:49:00.000-06:00</published><updated>2007-12-12T17:04:26.967-06:00</updated><title type='text'>Alvarado -- SketchREAD</title><content type='html'>Alvarado, Christine, and Randall Davis. "SketchREAD: A Multi-Domain Sketch Recognition Engine."&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Alvarado presents a new engine for sketch recognition that uses dynamic Bayesian networks (DBNs). Incoming strokes are parsed, as they are drawn, into sets of low-level primitives using Sezgin/Stahovich's toolkit. These primitives are fed into DBNs that give probabilities for the strokes forming certain shapes. Templates of predefined DBNs are brought out and connected together to match the primitives and connect the primitives together in a logical fashion. DBNs are interconnected to form networks that yield probabilities for high level shapes and what the authors call domain shapes and domain patterns.&lt;br /&gt;&lt;br /&gt;The DBNs are used for bottom-up recognition, where the primitives are pieced together into shapes, etc, and also for top-down recognition. The top-down portion of the program allows the engine to compensate for shapes that have not yet been finished. It also allows the system to revisit primitives and shapes and determine their accuracy. For instance, if a stroke is split into a 5-line polyline, but the domain has no 5-line shapes, the top-down recognizer can prune the 5-line hypothesis and generate others to test, such as a 2-, 3-, and 4-line hypothesis. Hypotheses are generated at each stage of recognition to see which possible interpretation of the strokes/shapes is most likely. The system has a pruning mechanism to prune those hypothesis without enough support in order to keep the state space from exploding, but this sometimes prunes correct hypotheses that just have not had enough time to develop (e.g. user still drawing finishing strokes). The top-down recognizer is expecting to re-add these pruned-but-correct hypotheses back in.&lt;br /&gt;&lt;br /&gt;The engine was tested on two sets of sketches, 10 family trees that were relatively clean and simple, and 80 circuit diagrams, which were far more complex. On average, SketchREAD got 77% of the family tree shapes right, and 62% of the circuit shapes right.&lt;br /&gt;&lt;br /&gt;One of the reasons SketchREAD performed "poorly" is because it makes several limiting assumptions to help avoid state-space explosion when searching for hypotheses. The first is that all the lines in a polyline are considered to be one shape, instead of trying each line separately. Additionally, the template DBNs to match strokes were often not matched correctly and the engine was not allowed to try all possible matches.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;This was a pretty neat approach. I like the way low-level errors were corrected later on. This seems like a must-have approach for any system that hopes to be widely successful. Of course, if you make better low-level recognizers, like Paulson's, this doesn't become too much of an issue. But still, nothing is perfect, as Alvarado mentions.&lt;br /&gt;&lt;br /&gt;There were several limitations in the system that I wish Alvarado could get around. I understand they're their to limit the size of the hypothesis space, but they ended up making things hard for her to get right. One of them was that all the lines in a polyline were considered to be part of one shape. This killed them in the circuit domain, where a user must just go on and on drawing wires and resistors without ever lifting the pen. But, trying to break it up into n lines and process each of the n lines separately might be too time consuming. &lt;b&gt;However&lt;/b&gt;, if a user had drawn all the lines separately, recognition rates would have been higher. Sure the system would have slowed down, but it's going to do that anyway the more you draw. Again, another tradeoff between accuracy and complexity.&lt;br /&gt;&lt;br /&gt;Also, in Figure 8, they note the bottom-left ground is too messy for their top-down recognizer to correct. Well, honestly is looks fine to me. I would like a way to incorporate vision based approaches for exactly this kind of error. It might be too messy when you crunch numbers, but it /looks/ like a ground. Perhaps combining a vision-based likelihood could boost it up?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-678176790260777559?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/678176790260777559/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=678176790260777559&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/678176790260777559'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/678176790260777559'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2007/12/alvarado-sketchread.html' title='Alvarado -- SketchREAD'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-8057640440553068480</id><published>2007-12-12T08:01:00.000-06:00</published><updated>2007-12-12T11:13:13.577-06:00</updated><title type='text'>Sezgin -- Temporal Patterns</title><content type='html'>Sezgin, Tevfik Metin, and Randall Davis. "Sketch Interpretation Using Multiscale Models of Temporal Patterns." IEEE Sketch Based Interaction, Jan/Feb 2007: 28--37.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Sezgin seeks to model sketches by analyzing the order in which strokes are made. He constructs two types of models, a stroke level model which is trained on how the strokes are drawn in relation to one another. The model looks at the order certain stokes are drawn, with the assumption that objects are drawn in generally the same stroke order each time.&lt;br /&gt;&lt;br /&gt;The second level of recognition occurs at the object level. This model combines the stroke-level models in a manner that describes in what order objects tend to occur in relation to one another. For instance, when drawing a resistor, users might usually draw a wire, then the resistor, then another wire. &lt;br /&gt;&lt;br /&gt;Both models are represented as dynamic Bayesian networks (DBNs). DBNs capture the temporal dependencies of the strokes and objects, acting like first order Markov models. They are trained on a set of sketches. Each sketch is preprocessed and broken into a set of primitives (lines, arcs, etc.), for each of which an observation vector is created. The observation vectors are fed into the DBNs and the probabilities of state transitions, etc., are trained. For classification, a sketch is broken into primitives/observation vectors in the same way and fed into the DBNs. The representation with the highest likelihood is chosen as the correct classification.&lt;br /&gt;&lt;br /&gt;Results were collected on eight users who drew 10 sketches of electrical diagrams each. They got about 80-90% accuracy over the entire sketch. One of the biggest weaknesses in their approach is that it is time-based, so any strokes drawn out of the expected order can throw a wrench in the works.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;This was a pretty neat approach to sketching, something that we've not seen before. I like the way that temporal patterns are looked at, because I do feel a lot of information can be extracted from stroke/object ordering. However, as shown in the paper, it can also be a hindrance. Strokes that are drawn out of expected order can cause recognition errors. This might be correctable with more training data, where outliers become more common and the system can account for them.&lt;br /&gt;&lt;br /&gt;They might also be able to account for out of order strokes by considering adding a bit of ergodic behavior to the models. Right now everything moves left to right, from start to completion of each object, and then on to the next object. Adding ergodic behavior would allow back transitions, self transitions, and possible jumps from the middle of one object to another. &lt;br /&gt;&lt;br /&gt;Sezgin mentioned that his PhD work dealt with creating a system that could "suspend" one object's model and start another if strokes occurred out of order. Of course, what happens then if you start a third? Is there a way to generalize this approach so you can have any number of n objects started at once?\&lt;br /&gt;&lt;br /&gt;The recognition results weren't great, but this is a groundbreaking paper since nothing like this has been done, really. The recognition rates will get better as the models improve. I would like to see something like this combined with a vision based approach. A system where strokes and objects are identified not just in the order in which they appear, but also based on spatial arrangements and such. This might help eliminate some of the temporal confusion encountered by Sezgin, if something "far away" was not considered, even if it occurred next.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-8057640440553068480?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/8057640440553068480/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=8057640440553068480&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/8057640440553068480'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/8057640440553068480'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2007/12/sezgin-temporal-patterns.html' title='Sezgin -- Temporal Patterns'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-6701402688730110977</id><published>2007-12-11T20:05:00.000-06:00</published><updated>2007-12-11T21:53:32.853-06:00</updated><title type='text'>Wobbrock et al. -- $1 Recognizer</title><content type='html'>Jacob Wobbrock, Andrew Wilson, and Yang Li. "Gestures without Libraries, Toolkits, or Training: A $1 Recognizer for User Interface Prototypes." UIST 2007.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Wobbrock et al. present a very simple single-stroke recognizer that uses a template-based approach, named because it's cheap and easy to create. Their method works in four basic steps.&lt;br /&gt;&lt;br /&gt;First, the stroke is resampled. Given a parameter N, the number of points in the resampled stroke, and the stroke length L, $1 interpolates N points (possibly new) along the original stroke path spaced L/N units apart. This ensures a uniform representation for fast/slow strokes to be compared to each other (fast strokes are less dense than slower strokes of the same length).&lt;br /&gt;&lt;br /&gt;Second, the resampled stroke is rotated so that comparisons are more or less rotationally invariant. The angle of rotation is computed from the line segment connecting the center of the stroke's bounding box and the first point of the stroke, rotated to the horizontal axis at 0 degrees.&lt;br /&gt;&lt;br /&gt;Third, the resampled and rotated points are scaled so that the bounding box is equal to a reference square of a certain size, disregarding original aspect ratio. The points are then translated so that the centroid (mean along x and y) is moved to the origin (0,0). These points are meant to make the strokes more scale invariant.&lt;br /&gt;&lt;br /&gt;Fourth, the stroke is compared to each template using a series of small rotations to find the template that matches the best. The rotations are made between a min and max angle threshold using the golden ratio. The score rating the match between stroke and template is the average distance between all corresponding points of the stroke and template divided by half the length of the bounding box diagonal (a limit of the path distance, can't be further away from stroke to template than half the bounding box), subtracted from 1 (so higher scores are better). The templates are ranked according to score and returned.&lt;br /&gt;&lt;br /&gt;The authors compared 10 entries of 16 gestures using their recognizer and variations of Rubine and a dynamic time warping algorithm. DTW and $1 perform almost indistinguishably, but $1 is much simpler and runs much faster.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;First my beef with the title: no training or libraries? What do you call the templates, one per possible type of shape you want to recognize? I understand their hyperbole, but it seems a little ridiculous, even for UIST.&lt;br /&gt;&lt;br /&gt;It's important to remember the purpose of this recognizer. This is not meant to be the next groundbreaking method that pushes 99.99% accuracy in every possible domain. Even their title belies this admission on their own behalf: "for User Interface &lt;b&gt;Prototypes&lt;/b&gt;." This, I believe, is meant to be something that you can just slap together an hour before a customer/investor comes over to get decent recognition results. The great part about this algorithm is that the entire source code is listed in the appendix, taking most of the mythos and shrouded mystery out of sketch recognition (no hidden Markov models or Bayes nets to code).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-6701402688730110977?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/6701402688730110977/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=6701402688730110977&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/6701402688730110977'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/6701402688730110977'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2007/12/wobbrock-et-al-1-recognizer.html' title='Wobbrock et al. -- $1 Recognizer'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-8053532994385855973</id><published>2007-11-08T12:17:00.000-06:00</published><updated>2007-11-08T12:39:43.454-06:00</updated><title type='text'>Adler and Davis -- Speech and Sketching</title><content type='html'>Adler, A., and R. Davis. "Speech and sketching: An empirical study of multimodal interaction." EUROGRAPHICS, 2007.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Adler and Davis set up a user study to analyze the ways in which speech and sketch interact in a multimodal interface. They created software that runs on two Tablet PCs and shows on the other what is drawn on one in real time, offering a multitude of pen colors, highlighting colors, the ability to erase, and pressure-based line thickness. This allows a participant (one of 18 students) and an experimenter to interact with each other. The participants completed 4 drawings including electronics schematics, a floor plan, and a project. The schematics were available for study before the drawing began but not during, making all four tasks as free-form as possible.&lt;br /&gt;&lt;br /&gt;The authors recorded the drawings, video, and audio and synchronized them all. They then labeled all the data for analysis. Some of the main things they found were:&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;Creation/writing strokes accounted for the super-majority of the strokes (90%) and ink (80%) in the sketch&lt;br /&gt;&lt;li&gt; Color changes were used to denote different groupings of strokes for emphasis&lt;br /&gt;&lt;li&gt; Most of the speech was broken and repetitive. This would make it hard for a full recognizer to analyze, but the repetitions provided clues about the user's intent. &lt;br /&gt;&lt;li&gt; Speech occurred at the same time the referenced objects were being drawn&lt;br /&gt;&lt;li&gt; The ordering of speech and drawing was the same&lt;br /&gt;&lt;li&gt; The open-ended nature of the sketching and speech interaction between the experimenter and participant evoked a lot of speech and clarification from the participant when asked simple questions by the experimenter&lt;br /&gt;&lt;li&gt; Parts of the speech that did not relate to the actual drawing itself gave hints as to the user's intentions&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;Analyzing these results, the authors found that for the most part, speech began before the sketching did when considering phrase groups broken up by pauses in the speech. However, when looking at word groups (saying "diode" when drawing one), the sketch usually comes first. Additionally, the amount of time difference between speech and sketch is statistically significant.&lt;br /&gt;&lt;br /&gt;So, overall, giving the user colors to play with and using speech information gives a system a lot more information to use to do a better job. &lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;I found it interesting that the participants gave up more information than was needed or asked for when answering the experimenter's questions, as if wanting to make sure the exp. understood completely. I wonder if the user would give up as much information if there were not a human present? I hypothesize that a human would just expect a computer to magically understand what was meant and not talk to it. Even in a wizzard-of-oz experiment, with a human really posing as the computer, I think the participant would speak much less frequently.&lt;br /&gt;&lt;br /&gt;I don't think the seemingly contradictory information in tables 1 and 2 is surprising. If I'm giving speech that is supposed to be informative, I'm going to include extra information in my speech that isn't directly related to the actual shapes I'm drawing. But, as I draw things, I will put words in my speech to let the observer know what I just drew. &lt;br /&gt;&lt;br /&gt;I wonder how often pauses in the user's speech accurately reflected boundaries between phrase groups and drawings. How often did users just flow from one thing to the next? My guess is that pauses are pretty darn good indication, like speed corners, but I'm just wondering. Also, what is the reason for the differences in the speech/sketch alignments when comparing word groups and phrase groups?&lt;br /&gt;&lt;br /&gt;The authors say that complete unrestricted speech recognition (narrative) is unduly difficult. Well, wasn't unrestricted sketch recognition the same way a decade ago? Things are hard. That's why PhD's have a job. If things were easy, you wouldn't have to study for 10 years just to get good at them. Unlimited speech recognition is unduly hard right now, and wasn't a tractable option for this paper, but it will come. Understanding "enough" of the input is good for now, but what about the other stuff, like the "junk" DNA. I bet it's important too.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-8053532994385855973?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/8053532994385855973/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=8053532994385855973&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/8053532994385855973'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/8053532994385855973'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2007/11/adler-and-davis-speech-and-sketching.html' title='Adler and Davis -- Speech and Sketching'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-7542792278468046208</id><published>2007-11-06T12:02:00.000-06:00</published><updated>2007-11-06T12:35:19.178-06:00</updated><title type='text'>Mahoney and Fromherz--Three main concepts</title><content type='html'>Mahoney, James V., and Markus P. J. Fromherz. "Three main concerns in sketch recognition and an approach to addressing them."&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Mahoney and Fromherz identify three requirements that they fell any sketch recognition technology should possess. First, it should cope with ambiguity in the sketch. This includes variance due to sloppy drawings, difficulties in finding correct articulation in the drawings (corner finding), and noisy drawings (stuff in the background, etc). Second, the recognition process should be efficient and provide quick results. This means we must have methods to bound the normally exponential state-space into something that we can search in a tractable method. Third, the method must be extensible to new domains. It can't rely heavily on domain-specific knowledge.&lt;br /&gt;&lt;br /&gt;Their method is to construct a graph of the segments in the image. Using image processing techniques (not stroke inputs) they detect segment endpoints and create nodes. The line the endpoints belong to is called a bond edge. Links between coincident endpoints are called links. The graph is put through a series of transformations that provide some variability in the possible interpretations of the structure. These include linking nodes that are close to each other (gap closure), linking around small segments if they are close to each other (removing corners, spurious segment jumping), adding nodes and links where lines intersect (adding corners to junctions), and combining corners where the segments are smooth (continuity tracing). Each of these methods has a modifiable parameter.&lt;br /&gt;&lt;br /&gt;Subgraph matching is used to determine the best fits for portions of the graph. Shapes that should appear in the graph, and that are looked for by the algorithm, are described in a constraint language what defines the different shapes and how they are linked together. It also provides a criterion that the algorithm should try to minimize (sets of constraints, like LADDER's method of minimizing the ideal constraints). For instance, they say that a head should be half as long as a stick figure's torso, etc. They limit some of the search space using a few a priori assumptions and hints, which they call salience, anchoring, grouping, and model bias. &lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;They state it themselves, but hand-tuning the parameters for the perceptual organization methods for elaborating/rectifying the graph is bad. This completely violates their call for recognitions processes that are applicable across domains. They undermine their approach, set up on this idealistic claim based on their "three concerns", when they completely trash one of those concerns. To their benefit they do state the need to learn these or estimate them automatically.&lt;br /&gt;&lt;br /&gt;In the worst case subgraph matching requires exponential time. They say this is not usually the case for "most practical problems." Defined how? Is something practical only if it conforms to the simple case of stick figures? What about something more general? Even if this isn't the case most of the time, if something does take exponential time (which you cannot guarantee that it won't), you've blown another tenet of your argument (the part about interactive recognition). There are so many a priori things they use to reduce the running time that I doubt the ability of this method to generalize to a multi-purpose recognizer. It seems like it has promise, but only at the risk of exponential running time.&lt;br /&gt;&lt;br /&gt;Again, no recognition results. How obnoxious. I guess you can just toss tenet 1 out the door too, since you can't prove that your method does any good at handling ambiguity.&lt;br /&gt;&lt;br /&gt;This seems like a good method to use with Sezgin for corner finding/reconciliation. I wonder if their results would have improved if they had stroke data available to them instead of simple image processing.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-7542792278468046208?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/7542792278468046208/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=7542792278468046208&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/7542792278468046208'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/7542792278468046208'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2007/11/mahoney-and-fromherz-three-main.html' title='Mahoney and Fromherz--Three main concepts'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-8016491695044209627</id><published>2007-11-06T10:47:00.000-06:00</published><updated>2007-11-06T11:26:24.227-06:00</updated><title type='text'>Sharon and van de Panne -- Constellation Models</title><content type='html'>Sharon, D., and M. van de Panne. "Constellation Models for Sketch Recognition." EUROGRAPHICS Workshop on Sketch-Based Interfaces and Modeling, 2006.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Sharon and van de Panne's approach to sketch recognition uses a set of shapes drawn with single strokes. These shapes are assigned a label based on maximum likelihood fits  computed from spherical Gaussian distributions calculated using training examples. The label that results in the highest likelihood (given the Gaussian parameters for that label) is assigned to that shape. Pairs of shapes are compared against each other, and Gaussians are trained for all pairings (pruned down to be pairings between mandatory shapes, as opposed to shapes in a sketch deemed optional). Labels are chosen that also maintain a high likelihood between pairings of shapes.&lt;br /&gt;&lt;br /&gt;Features of strokes include the center of the bounding box (x,y coordinates), the length of the diagonal of the bounding box, and the cos of the angle of said diagonal. Features between pairs of strokes include the differences between the center points (delta x, delta y) and the distance between the endpoint of one stroke and the closest point in the other stroke (and another feature the other way around). These feature vectors are computed and Gaussians are trained (four-dimensional spherical) with mean and diagonal covariance matrices calculated using training data. The models of the Gaussians are used to assign likelihoods to testing examples' feature vectors.&lt;br /&gt;&lt;br /&gt;Label assignments are searched for using a branch-and-bound method. To avoid a simple exponential (number of labels^number of strokes) brute-force search, different bounding heuristics are used to reduce the search space. One method is to assign all possible mandatory labels and trim branches with smaller likelihoods. This is still bad because we're still exponential in the number of mandatory labels^# strokes. Another approach is to use a multi-pass thresholding algorithm. We start at some high threshold and cut off any developing branch with a likelihood less than the thresh. If no labeling is found, we iterate with increasingly low thresholds until a full labeling is assigned. The third approach is to set a minimum likelihood for any given edge node in the search tree. Any assignment of a label with a likelihood below this threshold would terminate that branch immediately. This is the type of multi-pass thresholding implemented (as opposed to decreasing thresholds of labelings-so-far). The authors also discuss using hard constraints, such as forcing mouth-parts to appear below nose-parts.&lt;br /&gt;&lt;br /&gt;The author's don't give any recognition results, only amount of speed-up given by the multi-pass algorithm. I guess it doesn't do so hot.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;The features they use don't tell anything about the contents of the bounding box. It seems like using something like Paulson's MARQS recognizer would give better context-free recognition that's not dependent solely on the size/aspect of the stroke. This would probably help recognition and decrease the amount of time spent labeling the parts. Especially if employed in a bottom-up fashion.&lt;br /&gt;&lt;li&gt;On computing the likelihood of the model's likelihood:&lt;br /&gt;&lt;ol&gt; &lt;br /&gt;&lt;li&gt;They can use the log-likelihood and change the products to sums, to save computation time. Maximizing log(x) also maximizes x. &lt;br /&gt;&lt;li&gt; They assume independence between the probabilities (otherwise you can't multiply probabilities together). This is usually a safe assumption and greatly eases computation time. But, for instance, does assigning the label "nose" to this shape have any affect on the label of "mouth" for this other shape? Possibly, but learning joint probabilities requires a great deal of training examples. I advocate the independence assumption, but just wanted to point this out since they don't state it explicitly.&lt;br /&gt;&lt;li&gt;They assume uniform priors. I don't agree with this assumption at all. They say it's because of a lack of training data. Well, you're training Gaussians with a lack of training data, so you might as well estimate priors as well. This would be helpful for things like optional shapes, where it might not be the case that a eyebrow appears. So if we have something that we think might be an eyebrow (especially because their individual shape recognition only uses bounding box information) we might want to weight that with the prior.&lt;br /&gt;&lt;/ol&gt;&lt;br /&gt;&lt;li&gt;They have 20-60 training examples per shape. That's a lot! Plus they don't give recognition results, so we can only assume their method did horribly. I guess having that much information didn't help. Lack of training data is always an issue, so don't complain about it. If you can't get a good method for training priors, or hard constraints, using the limited data you have, you're doing it wrong.&lt;br /&gt;&lt;li&gt;Different classifiers work better for different amounts of training data. They don't have enough to fully describe a set of multivariate Gaussians and priors to use in a Baye's Classifier (ML or MAP). I think something like k-nearest neighbors would work well, using the feature vectors as points in some high-dimensional space.&lt;br /&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-8016491695044209627?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/8016491695044209627/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=8016491695044209627&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/8016491695044209627'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/8016491695044209627'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2007/11/sharon-and-van-de-panne-constellation.html' title='Sharon and van de Panne -- Constellation Models'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-2917573072025385632</id><published>2007-10-30T11:22:00.000-05:00</published><updated>2007-10-30T11:58:08.165-05:00</updated><title type='text'>Oltmans PhD Thesis</title><content type='html'>Oltmans, Michael. "Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches." MIT, May 2007. PhD Thesis.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Oltmans seeks a method of recognizing hand drawn shapes using an approach that harnesses a type of visual recognition rather than feature-based approaches (like RUbine or Sezgin). He analyzes both individually drawn, isolated shapes, and also automatically extracts and labels shapes from the domain of hand-drawn circuit diagrams.&lt;br /&gt;&lt;br /&gt;Stroke preprocessing includes rescaling the shape and re-sampling the points in the stroke, interpolating new ones if needed. He then marches along the stroke and places a radial histogram ("bullseye") every five pixels, each shaped like a dart board with a certain number of wedges and log-spaces rings. Bullseyes are rotated with the trajectory of the stroke to make them rotationally invariant. The bullseyes have a third dimension, which is stroke direction through each bin split into 0-180 degress in 45 degree increments. The bullseyes count the number of ink pixels through each bin (location relative to the center + direction) and create a frequency vector.&lt;br /&gt;&lt;br /&gt;The frequency vectors are compared against a codebook of shapes created by training on pre-labeled shapes. The labeled shapes are clustered (using QT clustering), with the mean-vector of each cluster being put into the codebook. An example shape that is to be classified has its constituent bullseye-vectors (its parts) compared to every shape in the codebook, giving a measure of distance (normalized sum of squared differences between vector elements). The min distance from each part to each element in the codebook gives a match-vector. The match vectors are classified using a support vector machine (one-to-one strategy) to match an example to its correct label.&lt;br /&gt;&lt;br /&gt;To extract individual shapes from a full diagram, Oltmans marches along the strokes and creates windows of multiple, fixed sizes (to account for different scaling of the shapes). The ink in each window is classified, and if it is close enough to a given shape, it is put in the list of candidate shape regions. The candidate regions are clustered using EM, with large clusters being split into two regions and clustered again. The cluster representatives form the list of final candidate shapes, which are sent back through the classifier to obtain a shape label.&lt;br /&gt;&lt;br /&gt;To evaluate his method, Oltman had 10 users draw circuit diagrams with a given set of shapes to use, but no instructions on how to draw them or lay the circuit out. The 109 resultant drawings were hand-labeled. Individual shapes were extracted from the drawing by hand and classified, giving 89.5% accuracy. He also evaluated a separate dataset for isolated shapes, HHreco, that had simpler shapes with little variance. He obtained 94.4% accuracy. When looking at how his method extracted shapes from the full circuit diagrams, 92.3% of the regions were extracted&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;There are a lot of things I have to ask since it's a large body of work (not much to pick apart in 6 pages compared to 90). So I'll be brief and bullet-point things.&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;&lt;li&gt;Were any dimensionality reduction techniques employed, especially on the codebook, to try and figure out which parts were the most indicative of shape class? With so many clusterings and parts (every 5 pixels), it seems this might have been advantageous. It might also mean you can originally construct a full codebook (not limit to 1000 parts/100 clusters) and whittle it down from there.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;You seem to claim complete visual recognition and avoidance of style-based features, yet you use direction bins as features in the bullseyes. Does this really buy you anything, since a user can draw however she wishes? You're already orienting to the trajectory.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;When extracting features from circuit diagrams, you have a special WIRE class to catch your leftovers. How hard would this be to generalize? Would you just throw out anything that's not labeled with enough probability?&lt;/li&gt;&lt;br /&gt;&lt;li&gt;In your results, you give confusion matrices with the recall for each class shape. It seems like the classes with more training examples had better recall. Did you find this to be strictly true (obviously it's true to some extent), or is it simply that some shapes are harder than others?&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Support vector machines are complex and very expensive, especially when you need O(n(n-1)/2) = O(n^2) of them (in the number of classes). Were any other classifiers tried? Something a little more simple?&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Did you experiment with the granularity of the bins, or was that the job of Belongie et al.? Did you try different spacings of the bullseyes (rather than every 5 pixels)?&lt;/li&gt;&lt;br /&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-2917573072025385632?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/2917573072025385632/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=2917573072025385632&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/2917573072025385632'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/2917573072025385632'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2007/10/oltmans-phd-thesis.html' title='Oltmans PhD Thesis'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-4169199399089785025</id><published>2007-10-25T14:38:00.000-05:00</published><updated>2007-10-25T15:02:56.397-05:00</updated><title type='text'>Oltmans -- ASSISTANCE</title><content type='html'>Oltmans, Michael, and Randall Davis. "Naturally Conveyed Explanations of Device Behavior." PUI 2001.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Oltmans created a system called ASSISTANCE that can take a recognized sketch (recognized by the ASSIST program) and speech input (vie IBM's ViaVoice program) and construct causal links for bodies in a mechanical engineering sketch. The example given in the paper is one of a Rube Goldberg machine that involves several bodies, springs, a pulley, and levers. The system is given force arrows that indicate direction and spoken  relationships that give more cause/effect information.&lt;br /&gt;&lt;br /&gt;The point of the system is that most current design systems, like CAD, require the user to put far too much thought and detail into early phases of the design. For instance, when trying to create a Rube Goldberg machine to crack an egg (the example above) that involves a spring, you don't want to have to specify things like the spring coefficient, or the elasticity of the various bodies, etc. Instead, you just want to get a rough idea to see if things work. Using ASSISTANCE, you first sketch a diagram, which is interpreted in an off-line manner by another program, ASSIST. You then annotate the diagram with various multi-modal explanations of the behavior (not specific parameters) of the system (using drawings--arrows--, speech, and pointing).&lt;br /&gt;&lt;br /&gt;Given the explanation annotations, ASSISTANCE adds them to a rule-based system. The system can draw consistent conclusions from the set of behaviors, yielding a series of causal structures. These structures describe how things happen in the system (body 1 hits body 2 and causes it to move, etc.). Searching for various causal links can be time consuming, but the authors find it to be very quick most of the time.&lt;br /&gt;&lt;br /&gt;No experimental results were given as to the system's efficacy besides their own experiences.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Speaking seems like a nice way to get more information to the design program. This is the first paper that I've seen, not necessarily that was written, where I've seen sketching and speaking be combined in a multi-modal interface. It seems like some of the issues that plague both fields (context, intention, etc.) could be resolved, with one mode of input helping to clarify the other, etc. Generally, the more information we have (a user speaking about the sketch rather than just the static sketch), the better equipped we are to do a good job. However, the obvious problem is that both speech and sketch recognition are Hard Problems (TM) and can be very daunting on their own, let alone when combined. Luckily for Michael he used existing software.&lt;br /&gt;&lt;br /&gt;The authors say that their system tends to not require exponential time to search the rule based system for a consistent set of causal links. However, the worst case running time is indeed exponential. It seems like they're just getting lucky because their domain is very small and limited. How much would adding a richer vocabulary and more domains increase th complexity of the system? Obviously exponentially. Would additions mean that the truth management system more often encountered exponential running times? Rule-based / knowledge-based deductive systems are neat, but they are extremely complex and in general are working to solve problems that are NP-hard, at least. Expecting to get good performance out of them, using anything other than an approximation, is foolhardy.&lt;br /&gt;&lt;br /&gt;I wish there were results, at least a usability study where people rated their system. But alas, there were not. I really don't want to hear the author's opinions about their own system since the want to get published and aren't going to say anything like "Our system didn't work at all and was completely counter-intuitive." Not to say that it is, or that's how I'd feel if I tried it out. It's a hyperbole.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-4169199399089785025?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/4169199399089785025/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=4169199399089785025&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/4169199399089785025'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/4169199399089785025'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2007/10/oltmans-assistance.html' title='Oltmans -- ASSISTANCE'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-5629990191563382503</id><published>2007-10-25T14:18:00.000-05:00</published><updated>2007-10-25T14:37:59.979-05:00</updated><title type='text'>Gross and Do -- the Napkin</title><content type='html'>Gross,  Mark, and Ellen Yi-Luen Do. "Ambiguous Intentions: a Paper-like Interface for Creative Design." ACM UIST 1996.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Gross and Do seek to implement a system where the designer can sketch plans for a system with a great deal of ambiguity and flexibility. This prevents the initial phase of the design from being too cumbersome and restrictive and saves possible interpretations and constraints for application later in the drawing process. This also relieves the system of some of its responsibilities as it is not forced to make accurate predictions for all drawn objects right away, some can be moved off to a later time when more contextual clues are available.&lt;br /&gt;&lt;br /&gt;Their system recognizes drawn primitives (called glyphs, things like boxes, circles, etc., and also lines and points) using a simple templating approach. A stroke is analyzed and compared to other example strokes, being classified to a certain type of stroke if it matches any of the templates with a certain amount of certainty. Matches are ranked according to certain thresholds, with things like context helping to break close ties.&lt;br /&gt;&lt;br /&gt;Context is established by the user giving the system examples. For instance, to say that four little squares sitting around a large square means a dining room table, the user would draw the boxes and edit the constraints on the various shapes. However, it might be the case that in certain contexts, those 5 boxes were a table (in a room plan for a house, for instance), or it might mean they are houses situated around a pond (in a landscaping drawing). Thus, the users can define different contexts that the system can identify by looking for symbols specific to one context or another. Contexts are kept in chains from specific to general. So for instance, both the context chains for room plans and landscaping would contain the symbols of 5 boxes, with different meanings in each chain.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;The definition of the constraints/contextual clues requires user input. It also seems like these constraints might be very rigid. Especially when the user has to draw examples of the constraints and context clues, this seems like a very time consuming process. Coupled with the difficulty of the system generating constraint lists from the drawings (which we saw in LADDER), this seems somewhat inefficient. However, I suppose that's the difficulty of describing constraints with a fixed language. It's hard enough for one person to describe geometrical layouts to another person using any vocabulary of their choosing. Intent is very difficult to capture, especially with a limited grammar.&lt;br /&gt;&lt;br /&gt;Their system for matching glyphs seems very brute-force and hacked together, in my opinion. I think it would have been nice to see some results on their recognition rates. One problem that I can see is their system of splitting the stroke's bounding box into a grid. What happens if you need more detail? Your system won't be able to add new shapes that need finer granularity in their paths. Or, if you do add more granularity, permuting the possible paths based on rotation and reflectivity becomes more difficult. Also, templating requires a lot of overhead. You have to save all the templates and make O(n) comparisons, which is not the case for other classification methods (like linear classifiers). I can see the strength in templating (can help avoid user- and domain-dependent features like Rubine and Long) but it feels like their version is hacked together. Maybe because they're architects.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-5629990191563382503?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/5629990191563382503/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=5629990191563382503&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/5629990191563382503'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/5629990191563382503'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2007/10/gross-and-do-napkin.html' title='Gross and Do -- the Napkin'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-1481419238841412589</id><published>2007-10-16T11:04:00.000-05:00</published><updated>2007-10-16T12:03:22.860-05:00</updated><title type='text'>Herot - Graphical Input Throuch Machine Recognition of Sketches</title><content type='html'>Herot, Christopher. "Graphical Input Throuch Machine Recognition of Sketches." SIGGRAPH 1976 : 97-102&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Herot's paper, from the 1970s, describes a system that seeks to find a way of recognizing sketching apart from the semantics of context and domain (i.e. recognizing low-level primitives and combining them hierarchically), a way of using domain context to construct higher-order shapes, and finally a way to allow the user direct involvement in the recognition process to tune the system's capabilities to that user's style.&lt;br /&gt;&lt;br /&gt;Herot's ancient, by modern standards, system was a microcomputer, several types of tables for sketching input, and even a storage tube! Input into the system was via the passive tablet, over which a large piece of paper was taped and drawn on using a modified graphite pencil. The pencil's eraser allowed for drawing a "negative line" to remove points from the system. The pen's position was sampled at a constant rate determined by a user-programmable variable. The points were used by the HUNCH system to perform corner detection, using speed and curvature data. HUNCH fit both lines and a separate routine, CURVIT, would fit B-splines to strokes. Herot found, however, that setting the thresholds for recognizing lines/curves was difficult and tended to be user-dependent.&lt;br /&gt;&lt;br /&gt;Herot's system also performed endpoint combination, called latching. In the first iteration, this used a fixed radius and joined any points within that radius. This gave error-prone results when the drawing was at a smaller scale than the radius. He tried to take into consideration the speed of the strokes to measure the amount of user intention in the endpoints, but still had problems especially in incorrectly latching 3D figures drawn on the 2D tablet. He handles overtracing by trying to turn several lines into one.&lt;br /&gt;&lt;br /&gt;The second part of their system is to incorporate context of the architectural domain to give more understanding to the sketch system. Basically, he seeks to build a bottom-up hierarchical recognizer that uses context to put things together. He also looks at top-down approaches, finding a "room" by looking for its "walls". He notes the need for some sort of ranking system, or something similar, in order to allay the affects of "erroneous or premature matches."&lt;br /&gt;&lt;br /&gt;Finally, Herot constructs an interactive system where the user can help guide the recognition decisions and modify the system's operating parameters through feedback so it adjusts to the user's personal drawing style.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Wow! Curvature &lt;em&gt;and speed&lt;/em&gt; data used for corner detection in the 1970s. So I guess Sezgin's work wasn't all that groundbreaking then? Sure, he did it on modern computers and did a much better job of it, but is that an artifact of improved technologies or his mighty powers as a researcher? Seeing this reference to curvature data almost 15-20 years before Sezgin's paper, I'm tending to lean toward simple advances in computing power and the level of knowledge in the field. Not to dismiss Sezgin's mightiness, on any account. He did things I could never do. But even he stands on the shoulders of his predecessors.&lt;br /&gt;&lt;br /&gt;A nugget of brilliance:&lt;br /&gt;&lt;blockquote&gt;A human observer resolves these ambiguities through the application of personal knowledge and years of learning how to look at pictures.... Before a machine can effectively communicate with a human user about a sketch [sic] it must possess a similar body of knowledge and experience.&lt;/blockquote&gt;&lt;br /&gt;Exactly! People keep crying out and gnashing their teeth for systems that capture human intention. They want systems that can perform better than humans! For simple systems and domains this is not a problem. In the future, I'm sure we can get into harder domains as the horsepower of the computers and intelligence of the algorithms increases. But humans grow up and learn these things over 20+ years with constant reinforcement-learning, and they automatically incorporate things like context and prior-knowledge that a computer can never know (that's a lot of knowledge to program and sift through). That's not something you can simply program and execute as a method call. &lt;/rant&gt;&lt;br /&gt;&lt;br /&gt;The construction of the hierarchical system was interesting to read over, seeing as how everything he called for now exists in come form or fashion. However, Herot was very pessimistic and ultimately incorrect that hierarchical learning would require context (primitives can be learned automatically--Paulson's recognizer), context is required "at the lowest levels," or that successful approaches would required knowledge-based systems (i.e. artificial intelligence, not the case anymore). &lt;br /&gt;&lt;br /&gt;Lastly, it would appear Herot's tablet is passive since it just uses a pen, probably some sort of pressure sensitivity on a narrow tip. Does this prevent me from resting my arms/hand on it while I draw?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-1481419238841412589?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/1481419238841412589/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=1481419238841412589&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/1481419238841412589'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/1481419238841412589'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2007/10/herot-graphical-input-throuch-machine.html' title='Herot - Graphical Input Throuch Machine Recognition of Sketches'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-2662781342501992379</id><published>2007-10-11T10:34:00.000-05:00</published><updated>2007-10-11T11:44:43.049-05:00</updated><title type='text'>Veselova - Shape Descriptions</title><content type='html'>Veselova, Olya and Randall Davis. "Perceptually Based Learning of Shape Descriptions for Sketch Recognition." AAAI 2004.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Veselova and Davis want to build a language for describing sketches arranged into iconic shapes and a system that uses said language to recognize instances of the shapes. Some of the difficulty with this problem lies in the subjective nature of perception, and what different people mean by "different"/"same" or features that are "significant" in making these judgment calls. The psychology of perception is studied and the authors propose a system of extracting the significant features and weighting them in order to make the decision on whether or not two shapes should be regarded as being the same.&lt;br /&gt;&lt;br /&gt;Based on perceptual studies, the authors group the description vocabulary that is used to describe constraints between primitives in the shapes into two categories. The first group is the singularities. These are the properties of a shape where "small variations in them make a qualitative difference" in the perception of a shape. Examples of singularities are the notions of vertical, horizontal, parallel, straight, etc. The vocabulary was constructed and each constraint was examined on a set of shapes to determine subjectively which constraints made the most difference in determining similarity. Accordingly, each constraint is assigned a rank.&lt;br /&gt;&lt;br /&gt;The importance of each constraint is not only defined by the rank with its peers, but is also adjusted using heuristics that take into account obstruction, grouping, and tension lines. Obstruction is a measure of how many primitives lie between objects 1 and 2. If the obstruction is high, the user will not be as likely to constrain a and b with each other. If, however, the obstruction is low, users will be able to easily constrain a and b with each other. Tension lines define how end- and midpoints of lines are aligned with each other. The human brain tends to like things well aligned, so if things in the shape are aligned, the constraints are boosted. Shapes that are grouped are also treated as wholes rather than individuals, so anything appearing in a group with each other has the constraints boosted, and things in different groups are not constrained as much.&lt;br /&gt;&lt;br /&gt;To evaluate their work, the authors had 33 users look at 20 variations ("near-miss" examples similar to Hammond and Davis' work) if 9 shapes. The variations were chosen so that half still met the constraints and half did not. The users said "yes" or "no" to each variation if they felt it did/did not still count as an example of that shape. Their system was able to, where 90% of the users agreed between themselves on the right answer, get 95% accuracy (the same as any user selected at random) with the majority. For shapes where 80% of the people agreed, the system got 83% and any random user would get 91%. Thus, the system was able to, for the most part, give an answer for shape similarity that matched the majority of the human answers.&lt;br /&gt;&lt;br /&gt;Future work for this project is to get a shape recognizer built that can use this system, be able to use more than just lines and arcs, build in a way of expressing extremes and "must not" constraints, and handle more than just pairwise constraints. Also, they want the system to be able to handle objects that have an arbitrary number of primitives (like a dashed line--how many dashes?).&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;It sounds like the authors used their own opinions to determine the ranking values for the constraints listed in the table in Section 3.2. I wish they would have performed a user study. I can see where, on the one hand, they would be better suited to assigning these ranking values because they can understand the details of the problem. They have the expert knowledge of knowing what to look for and what might make the most difference. However, that same knowledge is what makes their opinion less credible. If you build a real-world system, it's not going to be used by experts 24/7. A bunch of naive and ignorant (to the domain and problem of sketch recognition, not in general) users are going to be making their own decisions as to what constitutes similarity and what things are important in making those judgment calls. In short, I would have liked to have seen more user input in the vocabulary ranking stage.&lt;br /&gt;&lt;br /&gt;I wonder how the scalar coefficients were selected for the penalties on obstruction, tension lines, and grouping. How much do these values affect the final outcome of the prediction? I would like to see a graph of results vs. values for these variables. It just seems like these were sort of picked as a best-guess measure. Surely they weren't, but the paper gives me no other reason to believe otherwise. In their experiments section, the authors state that "people seemed to pay less attention to individual detail (aspect ratio, precise position, etc.) of the composing shapes in the symbol than the system biases accounted for." I wonder, then, if these biases could be adjusted by experimenting with varying the constraint ranking system and the coefficients listed above.&lt;br /&gt;&lt;br /&gt;Also, they said they don't have a shape recognition engine built yet. So I am confused as to how they performed the experiments. Is it that they don't actually recognize any shapes, but they generate the 20 examples by hand and manually violate constraints? If so, this seems like a Bad Thing (TM) that is very prone to error and more subjectivity. Maybe they can borrow Hammond's near-miss generator.&lt;br /&gt;&lt;br /&gt;But, after saying all that, I liked this paper. The authors seem to have found a way to say which constraints are more important than others. And, to their credit, they seem to get decent results agreeing with the majority of users. However, without having a sketch recognition engine to do these things automatically, I wonder how biased their results are.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-2662781342501992379?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/2662781342501992379/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=2662781342501992379&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/2662781342501992379'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/2662781342501992379'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2007/10/veselova-shape-descriptions.html' title='Veselova - Shape Descriptions'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-7556924189268997244</id><published>2007-10-09T12:06:00.000-05:00</published><updated>2007-10-11T11:45:10.320-05:00</updated><title type='text'>Hammond and Davis -- Constraints and Near-Miss Examples</title><content type='html'>Hammond, Tracy and Randall Davis. "Interactive Learning of Structural Shape Descriptions from Automatically Generated Near-Miss Examples." IUI 2006.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Hammond and Davis are concerned with shape description languages and the way they are used by both human users and computer programs that automatically generate descriptions given a shape. The problem is that descriptions tend to either be over-constrained, in that the description is too specific to one particular example and not generalized enough, or under-constrained, that the description does not accurately capture all of the defining characteristics of the shape. The problem, then, is to create a system that can help the user identify both unneccesary and missing constraints using automatically generated near-miss examples, and having the user tell the learning algorithm whether the example is a positive instance of the shape or not.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The system starts with a hand-drawn shape and a set of constraints that provide a positive match for that shape. If the set of constraints is automatically generated from the shape, the match is guaranteed to be positive but lacks the flavor of capturing user intention. If the description is hand-written, user intention is captured, but the constraint list might not accurately reflect what the user intends. In either case, the set of constraints will probably need to be modified.&lt;br /&gt;&lt;br /&gt;The authors start by removing constraints that are too specific (meaning the description is over-constrained). Constraints are tested by negating one constraint at a time and generating example shapes with that negated constraint. If the shape is said to still be a positive example of what the user intended, the negated constraint is said to be unneccesary and is discarded. Constraints that are needed will turn the generated shape into a negative example of the user's intention and will be kept. Shapes are tested for scaling and rotational invariance here, as requiring certain lines to be horizontal or a certain length is possibly an over-constrained description.&lt;br /&gt;&lt;br /&gt;Second, the description is tested for being under-constrained by generating a list of possible constraints (using some heuristics and filters to keep the list to a reasonable size) that are not included in the shape description. Again, one at a time, the negation of these constraints is added and examples are generated. If the example is said to be positive, the constraint is truly not needed (the negation still held). If the example is said to be negative, the constriaint is needed (the negation was false -&gt; not not -&gt; true).&lt;br /&gt;&lt;br /&gt;Shapes are generated using a set of algebraic equations that describe the shapes. The unknowns are solved using third party software (Mathematica). Once the system of equations is solved, the values can be used to draw the shape (if a solution exists).&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;This is a pretty solid paper and is easy to understand. It doesn't feel like the authors have left out too much. One thing I am wondering, however, is the running time and storage requirements for the algorithm. The over-constrained case doesn't seem to be a problem because you're starting with a fixed set of constraints and pruning it. However, when adding constraints, even with the heuristic method, you're going to have a large number of constraints to add. Admittedly this is a heuristic, but is it a good one? Do the authors have any way to show that this does better than some other way? Even limiting the number to n^2, that's still n^2 systems of equations to solve, n^2 shapes to show to the user, and n^2 times the user has to say 'yes' or 'no'. Is that simply the price one has to pay to get a "perfect" description?&lt;br /&gt;&lt;br /&gt;Overall I would like less "this is how we do it" and more "this actually works" out of the paper. How does the system do at generating final descriptions that work? How long does it take? How many constraints does it remove/add on average? It's a good idea, it sounds like, but is it worth it to me in the real world, or is it just something that's nice to talk about.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-7556924189268997244?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/7556924189268997244/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=7556924189268997244&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/7556924189268997244'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/7556924189268997244'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2007/10/hammond-and-davis-constraints-and-near.html' title='Hammond and Davis -- Constraints and Near-Miss Examples'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-4531286041342413541</id><published>2007-10-04T12:28:00.000-05:00</published><updated>2007-10-04T12:43:39.429-05:00</updated><title type='text'>Hammond and Davis -- LADDER</title><content type='html'>Hammond, Tracy, and Randall Davis. "LADDER, A sketching Language for User Interface Developers." &lt;u&gt;Computer Graphics&lt;/u&gt; 29 (2005) 518:32.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Hammond and Davis want to create a system the enables the rapid development of systems using sketch recognition methods. They want the flexibility to allow the user to define shape descriptions and how to display those shapes, but not to burden the user down with the knowledge of how exactly to recognize those shapes given pen input.&lt;br /&gt;&lt;br /&gt;The authors propose the LADDER language, which is composed of predefined shapes, constraints, editing behaviors, and display methods. The predefined shapes are simple primitives such as lines, arcs, circles, etc. Constraints (either hard--must be met--or soft--optional) can be placed on sets of these primitives to define higher-level shapes. Once a shape has been recognized, the user can specify how it is to be displayed. Also, he can specify what editing actions should be allowed to be performed on the shape. To generate the syntax for the shapes, the authors performed  a user study where they asked 30 students to verbally describe shapes using increasingly limited vocabularies. Their syntax is shape based, which is more intuitive than a feature based syntax and is independent of drawing style. However, the language is limited in that the grammar is fixed and limited to very iconic, regular, and simple shapes, is limited to a set of primitive constraints, and is bad at handling curves.&lt;br /&gt;&lt;br /&gt;Shapes are defined by a set of component shapes (either primitives or other defined shapes, constraints, aliases to simplify naming, and editing and display behaviors. Shapes may be abstract for hierarchical behaviors, and may be grouped into units for chaining behaviors. Vectors of components can be used to define a variable number of sub-shapes. Shapes can be edited after drawing by specifying editing rules.&lt;br /&gt;&lt;br /&gt;Strokes are recognized as soon as they are drawn. The primitives are placed into a Jess knowledge-based system that is searched for any completed shapes. LADDER automatically provides for the automatic generation of the JESS code.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;The authors skirt around the issue of representing curves. They say it is hard to do and their system is not good at it, but then later say they have the primitives Bezier curve, curve, and arc. I wonder how the representations are different for each curve? How are they able to generalize these structures while still maintaining user-intended semantics? Is it good enough to just say "there is a curve here" or do the control points have to be strictly defined? Overall, I wish more information on curve-handling would have been provided.&lt;br /&gt;&lt;br /&gt;I liked the use of isRotatable so that I can be very strict in defining relations between components, but then allow those components to be rotated along any axis. I wonder how much computation this adds to the recognition process?&lt;br /&gt;&lt;br /&gt;Jess seems like a very complicated route to get shape recognition. I wonder if some sort of hierarchical data structure could have been used for recognition... Will have to think about this more.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-4531286041342413541?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/4531286041342413541/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=4531286041342413541&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/4531286041342413541'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/4531286041342413541'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2007/10/hammond-and-davis-ladder.html' title='Hammond and Davis -- LADDER'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-6789717798653970030</id><published>2007-09-27T11:26:00.000-05:00</published><updated>2007-09-27T12:23:50.157-05:00</updated><title type='text'>Paulson and Hammond -- New Features and Ranking</title><content type='html'>Paulson, Brandon, and Tracy Hammond. "Recognizing and Beautifying Low-Level Sketch Shapes with Two New Features and Ranking Algorithm." In submission, 2007.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Paulson and Hammond construct a system for recognizing low-level primitives made with single strokes, combining the primitives together in a hierarchical fashion and beautifying the strokes. Previous methods at this type of work used user- and style-dependent features (e.g., Rubine, Long) that required many training examples. Corner-detection methods (e.g., Sezgin, Yu and Cai) are able to find primitives but not construct complex hierarchies out of them for beautified strokes. The authors' improvements are adding two new features, plus a ranking system to choose the best of many possible fits to complex strokes.&lt;br /&gt;&lt;br /&gt;The first stage of their system is pre-recognition computations, namely of features, overtrace detection, and stroke closure detection. The first of the two new features that are introduced are normalized distance between direction extremes (NDDE), which measures the ratio of the stroke that lies between direction extrema (arcs are high, polylines with many changes tend to be lower). The second new feature is direction change ratio (DCR), which is the ratio of maximum direction change to average direction change (similar to acceleration, polylines change often and dramatically and are high, arcs are smoother and are low). Testing for overtracing is performed by looking for constant direction for &gt; 2*PI length. Stroke closure is tested by making sure the endpoints are "close enough" together.&lt;br /&gt;&lt;br /&gt;The strokes are then fed into a series of primitive recognizers to test for lines, polylines, ellipses, circles, arcs, curves (Bezier of order &lt; 5), spirals, and helixes. Complex shapes are constructed if none of the above tests for primitives pass by splitting the stroke on highest curvature point and recursively feeding the substrokes into the primitive recognizer (similar to Yu). Once a complex fit is obtained, the substrokes are examined to see if any can be recombined. The interpretations of the primitives/complex representation that are deemed to fit the stroke are ranked according to complexity. The ranking procedure tends to choose simpler models (in the number of primitives required to construct it), based on empirically-set costs for certain primitives.&lt;br /&gt;&lt;br /&gt;Their system runs very quickly compared to a modified Sezgin recognizer, taking only half the time, on average, to classify a stroke (but both are very fast). Their method also achieved an impressive 98-99% accuracy classifying test shapes. their difficulties arose when the system had to decide between an accurate polyline fit and a simpler model of fewer primitives. Their method consistently outperformed Sezgin's algorithm, and is able to classify many more types of strokes.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;First, it's nice to see a paper that clearly defines their thresholds and constants. Second, it's nice to see a paper that gets such high accuracy levels on many types of primitives and combinations into complex shapes. I was starting to get discouraged that the problem of sketch recognition was 'too hard.'&lt;br /&gt;&lt;br /&gt;The authors mention detecting overtraced circles. I wonder about overtraced lines, say if the user wanted to make there there was a connection between two objects. I assume that currently these would be treated like a polyline that just curves back and forth on itself. I wonder how difficult it would be to detect overtraced lines and construct 'heavy' lines out of them.&lt;br /&gt;&lt;br /&gt;Also, the authors have no detection methods for anything rectangular. It would be nice to see 4 polylines that could be created into a square or rectangle. Furthermore, it seems like the application of the hierarchical fitting system could be extended to arbitrary regular polygons. Of course, the difficulty arises in deciding how many "automatic" shapes are too many to compare against. However, if the classification is hierarchical, perhaps these tests would only be performed if the stroke was closed, and then matched to the appropriate n-gon depending on the number of sub-strokes (lines in the polyline).&lt;br /&gt;&lt;br /&gt;I thought the ranking system was very unique. I felt this was one of the best points of the paper and by the results in Table 1, it provides more accuracy by itself than just using the new features. I wonder if there is a different way to compute ranking costs other than empirical observation. Perhaps not, since judgments for accuracy are going to be empirical anyway (does the user think it looks right).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-6789717798653970030?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/6789717798653970030/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=6789717798653970030&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/6789717798653970030'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/6789717798653970030'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2007/09/paulson-and-hammond-new-features-and.html' title='Paulson and Hammond -- New Features and Ranking'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-2884081558302515130</id><published>2007-09-27T11:24:00.000-05:00</published><updated>2007-12-11T22:27:36.537-06:00</updated><title type='text'>Abam - Streaming Algorithms for Line Simplificationm</title><content type='html'>Abam, Mohammad Ali, et al. "Streaming Algorithms for Line Simplification." SCG, 2007.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Abam et al. present a few algorithms for simplifying a set of points in a line into a simpler representation. The focus of the algorithms seem to be tracking an object through a space. After a certain amount of time, the finite storage space of the system doing the tracking is exhausted. To compact the path representation, line simplification algorithms are used to modify the stored path and reduce storage requirements.&lt;br /&gt;&lt;br /&gt;They consider two distance metrics to measure the error of their simplified path from the real path. The Hausdorff error between a set of points is taken as the maximum distance between any pair of points in the two sets. This is also called complete-linkage distance as used in agglomerative hierarchical clustering. The other distance metric is called the Frechet distance, and can be thought of as the minimum length of a leash needed so that a man can walk on one line and his dog on the other line, where either may stand still but neither may walk backward.&lt;br /&gt;&lt;br /&gt;The algorithms generally work by starting with a certain number of points in the line simplification, called set Q. As the algorithm progresses, the point in Q with the minimum error (distance from the true line) is removed from the simplification, so the simplification is a little worse, but not that much so.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;This was, simply put, a hard paper to read.&lt;br /&gt;&lt;br /&gt;I can see the applications of this paper. For path simplification in a situation where the entire, detailed history of an objects motion is either not necessary or is too large to fit in memory, these algorithms work great. However, I'm not sure of the application of these algorithms to sketch recognition. It seems like we would lose far too much detail in the simplification to be of any use to recognizers later. &lt;br /&gt;&lt;br /&gt;I think a better way to perform stroke compression for sketch recognition could be performed in two steps. First, resampling can be used to reduce the number of points and still maintain any granularity of detail desired. Second, a sketch is inherently made up of certain primitives that can be reduced into their cleaned versions to reduce the amount of space needed to represent them. For example, a line/polyline could be represented by just the corners/endpoints, an arc with endpoints and radius.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-2884081558302515130?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/2884081558302515130/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=2884081558302515130&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/2884081558302515130'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/2884081558302515130'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2007/09/abam-streaming-algorithms-for-line.html' title='Abam - Streaming Algorithms for Line Simplificationm'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-1344824277676667485</id><published>2007-09-20T14:22:00.000-05:00</published><updated>2007-09-20T15:25:44.991-05:00</updated><title type='text'>Kim and Kim - Curvature Estimation for Segmentation</title><content type='html'>Kim, Dae Hyun, and Myoung-Jun Kim. "A Curvature Estimation for Pen Input Segmentation in Sketch-Based Modeling." 2006.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Kim and Kim look at the issue of using curvature to determine segmentation points for pen-drawn strokes. This is a recurring theme in the literature, including use in Sezgin and Yu. Kim and Kim make the contribution of looking at two new methods of computing curvature. They make the assumption that that every point drawn by the user is intentional and there is no noise in a gesture. &lt;br /&gt;&lt;br /&gt;The authors perform a sort of smoothing on the dataset, resampling the stroke points to produce a set of data where the points are approximately a certain distance apart. The specifics of how this operation is carried out is not discussed, but they do say they use a distance of 5 pixels. This re-sampled set of points makes up a stroke. Changes in direction are computed as normal per point.&lt;br /&gt;&lt;br /&gt;Curvature is defined to be the change in direction / the length this change occurs over. However, because resampling has occurred and each point is now approximately the same distance apart, curvature simplifies to just being the change in direction at each point.&lt;br /&gt;&lt;br /&gt;The authors then propose two methods for adjusting the curvatures based on support neighborhoods around each point. The neighborhoods are initially of size k. So the support for point p_i would be points p_(i-k) through p_(i+k). The authors look at the convexity of the curvature changes moving away from p_i, adding the curvatures of point p_j (for i-k &lt;= j &lt; i, and i &lt; j &lt;= i+k, moving away from p_i in either direction) only if the curvature of p_i has the same sign as the curvature at point p_i. In other words, any curvature that is occurring in the same direction (clockwise or counter-clockwise) is added together to bolster the curvature of p_i (within the window of k points on either side of p_i).&lt;br /&gt;&lt;br /&gt;This is a problem when there are several consecutive points (say on an arc) that all have the same sign. In this case, the curvature values of each point would all be equal, since they all get the sum of the curvatures in the neighborhoods. To fix this problem, the authors add an additional constraint. Not only must the curvature of neighboring points have the same sign, but the curvature values must be monotonically decreasing in order to be added. This has the effect of weighting certain points with a higher curvature.&lt;br /&gt;&lt;br /&gt;The authors compare their curvature estimation methods with the circumscribing circle and curve bending function methods, finding theirs to be comparable and arguing their means are justified. They then show an evaluation experiment, showing the results of gesture segmentation based on support neighborhoods alone, using convexity information alone, using the convexity and monotonicity measures, and compared to another method. They claim 95% success in detecting the correct segmentation points.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;It seems strange to me that the authors assume that all the input points are intentional and that there is no noise. But then, they resample the data! If you assume no noise is in the data and that all the points are intentional, why do you throw away features by resampling? They show how it makes the curvature calculations simpler, but really, is computing the curvature really that hard to begin with? Even if you have to do a division by the length of the segment to calculate the initial curvature at each point, you can still use the convexity and monotonicity checks to weight each curvature. I also have a problem with the assumption that there is no noise to begin with. I understand that you either assume noise or you don't, and that either way has its pros and cons. But until input devices get 100% accurate, and until the striated muscle fibers in my arm, hand, and fingers quit tensing and relaxing multiple times per second (hold your hand out in front of you and try to hold it perfectly still--you can't unless you're a robot and you assume outside factors don't affect your motion), there will ALWAYS be noise. Now, the hard part is in determining what is true noise, and what are intentional deviations. And guess what, this is also impossible. Because what might look statistically like noise might really be me wanting to make intentional, small serrations in my otherwise-straight line. There is absolutely no way to guess user intention, in a general sense, with 100% accuracy (not even a human can do that 100% of the time, much less a computer)&lt;br /&gt;&lt;br /&gt;&amp;lt;/rant&amp;gt;&lt;br /&gt;&lt;br /&gt;However, aside from that soap-box issue, I love the way that curvatures are weighted in the paper. It's nice to see curvature weighting methods, especially when I'm facing the problem in my own segmentation implementations of figuring out which of the gobs of curvature-based segmentation options to use. I also like how they use the speed information to change the threshold in determining if a curvature change is intentional or not, assuming that lower speeds mean more intentional measures and giving more weight to curvatures. I do wish, however, that this would have been discussed more than in passing.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-1344824277676667485?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/1344824277676667485/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=1344824277676667485&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/1344824277676667485'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/1344824277676667485'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2007/09/kim-and-kim-curvature-estimation-for.html' title='Kim and Kim - Curvature Estimation for Segmentation'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-8991535428757982058</id><published>2007-09-18T12:00:00.000-05:00</published><updated>2007-09-18T12:38:10.680-05:00</updated><title type='text'>Yu - Corners and Shape recognition</title><content type='html'>Yu, Bo, and Shijie Cai. "A Domain-Independents System for Sketch Recognition." 2003.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;The authors present work very built off Sezgin's basic corner detection algorithms. Their system recognizes primitives (line segments, arcs, circles...) and puts them together via basic object recognition. Deficiencies in current systems they wish to address are strict restrictions on drawing style and inability to decompile curves (smooth or hybrid) into constituent primitives. To this end, they wish to allow users to draw naturally (multiple strokes are fine), provide consistent recognition results, group primitives hierarchically, beautify noisy strokes into higher level shapes.&lt;br /&gt;&lt;br /&gt;First, the system performs stroke approximation immediately after the stroke is drawn. The points of the stroke, consisting of (x, y, timestamp) tuples, are examined direction and curvature graphs are computed, very similarly to Sezgin's methods (curvature here is slightly different). Vertices are declared to be the peaks in curvature. To approximate a stroke with primitives, the entire stroke is first attempted to be fitted with a primitive. If this fails, the stroke is split into substrokes at the point with highest curvature and approximation recurses on the substrokes until all portions of the stroke are approximated with a primitive.&lt;br /&gt;&lt;br /&gt;Line approximation occurs first. A stroke is checked if it is a line by examining the direction graph (should be horizontal--no changes in direction) and the actual points in the stroke (should fit to a straight line). If either of these criterion fit to a best-fit line (using a least-squares fit) with few "deviations", a line is fit between the stroke endpoints. Valid fits are those that minimize the feature area to standard area ratio.&lt;br /&gt;&lt;br /&gt;If a stroke is not a line, it is fed to the curve approximation routine. The direction graph for a smooth curve should be linear (constant changes in direction). Their system can fit circles, elipses, and arcs. It can even fit overtraced curves (the length of the direction graph is longer than 2*pi) by splitting the direction graph into lengths of 2*pi and fitting each length. Then, if the set of circles is similar, averaging them. Or, if the set of circles has decreasing radius, turn it into a helix. Like lines, circle and arcs are fit by computing the ratio of feature area to standard area, seeking to minimize this ratio.&lt;br /&gt;&lt;br /&gt;Post processing includes removing noisy points (that occur at the beginning or end of a stroke), merging similar strokes, and inferring constraints such as connectivity, parallelism, and perpendicularity. Sets of heuristics can be used to combine primitives (such as four line segments) into higher level objects (such as rectangles).&lt;br /&gt;&lt;br /&gt;Experimentally, there system is able to identify primitive shapes and polylines with 98% accuracy, and arcs alone with 94% accuracy (crude arcs are incorrectly recognized as a set of lines and arcs). Higher level shapes combining primitives only recognized at about 70%, with problems encountered with smooth connections (since their method /only/ uses curvature).&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;It seems like their definition of curvature is different from Sezgin, normalizing the change in direction over a neighborhood by the stroke length. I wonder how this compares to Sezgin as far as vertex prediction. I was also surprised they did not use  speed data. I think Sezgin made a good point for using both speed and curvature.&lt;br /&gt;&lt;br /&gt;This paper is neat in that it uses primitive approximations to drive vertex detection, rather than finding vertices and then trying to fit primitives to the spaces between them. I think speed data could and should have been used in some sort of hybrid method like Sezgin, where instead of Sezgin's "just add until we get below a certain error", we pick the vertex with the highest certainty metric and use this to divide the stroke, trying to fit primitives to the substrokes. Basically, a combination of the two methods.&lt;br /&gt;&lt;br /&gt;Again, we have a lot of talk about thresholds and limits without describing what these are, how they were derived, and why they are valid. I found it amusing that they say "here [looking at curvature data] we avoid any empirical threshold or statistical calculation for judging vertices." Well sure, because they push all that hand-waving down into the primitive approximation processes. Which, in the end, still equates to judging vertices. Examples of thresholds not well defined:&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;Line approx - number of points "deviating" from best-fit line &lt;ul&gt;&lt;li&gt;What does it mean to deviate?&lt;/li&gt;&lt;li&gt;What's the "relatively loose" threshold&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Cleanup - "very short" line segments for deletion/merging, how short?&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-8991535428757982058?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/8991535428757982058/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=8991535428757982058&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/8991535428757982058'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/8991535428757982058'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2007/09/yu-corners-and-shape-recognition.html' title='Yu - Corners and Shape recognition'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-3151412643375726731</id><published>2007-09-13T11:46:00.000-05:00</published><updated>2007-09-18T12:00:02.108-05:00</updated><title type='text'>Early Processing for Sketch Understanding</title><content type='html'>Sezgin, Tevfik Metin, et al. "Sketch BAsed Interfaces: Early Processing for Sketch Understanding." PUI 2001.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Sezgin et al seek to develop a system that allows users to directly draw sketches without restrictions (number of strokes, direction of strokes), and to have those sketches "understood" (approximation phase) into their component lines and arcs. The system can then combine these into low-level primitives (rectangles, squares, circles, etc.) as appropriate, beautify the drawing (second phase), and perform higher level understanding on the primitives (basic recognition phase).&lt;br /&gt;&lt;br /&gt;Their approximation subsystem, which divides strokes into line segments and arcs, works by detecting 'corners', or where two line segments connect. Corners are detected by examining the speed (intuitively, users tend to slow down when they draw meaningful corners) and curvature (a corner is, obviously, a change in direction) of the sketch. Average based filtering is used to detect speed minima (low speeds denote corners), F_s, and curvature maxima (high change in direction denotes corners), F_d. The set of vertices that denote corners in the drawing is compiled from the sets F_d and F_s using an iterative technique that adds the most "certain" vertex from each set (certainty is computed differently for speed and curvature, and tells approximately how 'significant' the values of speed and curvature are at those points), picking between the most certain vertices using least squared error fitting. The iterations continue, adding points by both speed and curvature, until the error of the fit drops below a certain threshold.&lt;br /&gt;&lt;br /&gt;Arcs are fit with Bezier curves (consisting of two endpoints and two control points). First, arcs are detected by looking at the distance between consecutive vertices (from corner detection) and computing a ratio with the total arc distance (sum of the distance between all the points between the vertices). Straight segments will have a ratio close to 1, while arced segments will have a ratio much higher than one (since the space between vertices is not a straight line, like the computed distance between vertices). Control points are then calculated based on the tangent vectors to the arc.&lt;br /&gt;&lt;br /&gt;One lines and arce have been identified and represented using vertices and Bezier control points, the program can be beautified by not only drawing lines straighter, but also by rotating lines that are supposed to be parallel until they actually are. Beautified sketches can be compared to templates to perform basic object recognition.&lt;br /&gt;&lt;br /&gt;They evaluated their system subjectively, using human subjects to say whether they liked the system or not (9/10 said they did). The authors also claim a 96% success rate in determining true vertices.&lt;br /&gt;&lt;br /&gt;Their work is an improvement on other work because the vertex recognition is automatic, it allows for more natural sketching (the user does not have to lift the pen, stop drawing, or press buttons to denote vertices), and is completely automatic. In the future, the authors hope to use machine learning techniques (like scale space theory) to reduce the need for hard-coded thresholds. They also want to explore intentional changes in speed by the user as an indication of how precise they wish their sketch to be (requiring less beautification, presumably, for a careful sketch).&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;There are so many details that are omitted, it's crazy! I understand the need to be concise and fit things into a specified number of pages, but these seem like major operating details.&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;&lt;li&gt;What is the error threshold to stop adding points to the final set of vertices, H_f? Why is this value hard-coded and not some sort of statistical test?&lt;/li&gt;&lt;br /&gt;&lt;li&gt;In the average based filtering, why do we hard-code the mean threshold? They do cite this as a need for scale space theory, and they want to be simple, so it's understood.&lt;/li&gt;&lt;li&gt;They talk about fitting an arc to a segment if the ratio of stroke length / euclidean distance is significantly higher than one. Significantly how and how much?&lt;/li&gt;&lt;li&gt;When they rotate line segments to make things parallel/perpendicular, what happens to vertices that no longer touch (line segments don't overlap)? Or do they change connected lines at the same time?&lt;/li&gt;&lt;li&gt;Why not beautify curves? Or do they? They weren't clear if they did or not, I don't think.&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-3151412643375726731?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/3151412643375726731/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=3151412643375726731&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/3151412643375726731'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/3151412643375726731'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2007/09/early-processing-for-sketch.html' title='Early Processing for Sketch Understanding'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-4692236976356748302</id><published>2007-09-10T16:41:00.000-05:00</published><updated>2007-09-10T17:18:54.058-05:00</updated><title type='text'>MARQS</title><content type='html'>Brandon Paulson and Tracy Hammond, &lt;i&gt;MARQS: Retrieving Sketches Using Domain- and Style-Independent Features Learned from a Single Example Using a Dual-Classifier&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Paulson and Hammond implement a system that can be used to search a database of sketches and rank matches from the database with a high degree of accuracy. The matching is performed with a new dual-classifier algorithm and new sketch features that allow for unconstrained sketches (ones that can exist with a different number of strokes, ordering of strokes, scale, or orientation), can be trained on a single example (the first example of a stroke in the database that we need to query against), and can learn over time with successful queries.&lt;br /&gt;&lt;br /&gt;The four features used by the authors are the aspect ratio of the bounding box, the pixel density of the bounding box, the average curvature of the strokes in the sketch, and the number of perceived corners in the sketch. Before the features are computed, the images are rotated so that their major axis (defined as the line between the pair of points farthest from each other) is horizontal.&lt;br /&gt;&lt;br /&gt;When only one gesture of a given class exists in the database, it is classified using a 1-nearest neighbors approach. That is, the features of the query sketch are compared to the features of all the sketches in the database, and the database sketch with the least amount of difference in feature values is selected as the match for that query. When more sketches are added to the database for that same gesture class,  the algorithm begins to use a linear classifier (similar to Rubine's method) for greater accuracy and speed.&lt;br /&gt;&lt;br /&gt;In an experiment, the authors had 10 users create 10 examples of their own sketch. The first is used as the initial key insertion into the database. The remaining 9 are used as queries into the database. Additionally, 10 examples each of 5 strokes were added by the authors to test the effects of orientation and scale more precisely. The experiment was run 10 times, with the ordering of the gestures used as insertions/queries to the database randomized. The average rank of the correct gesture given the query sketch was 1.51. 70% of the time, the right answer was the top choice, 87.6% in the top two, 95.5% in the top three, and 98% in the top four.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;The authors mentioned that using the single classifier to check against the first query of a gesture class takes a long time, proportional to O(nf) where n is the number of items in the database and f is the number of features. They mentioned using some sort of generalization of the features across all gestures in a particular class to speed things up, instead of comparing to every example within the class, but at the possible sake of accuracy. I wonder if it would be possible to use some sort of database indexing scheme based on the various features to get a quick lookup of matches. You may even use the generalized features to get a class of sketches that looks more like the query than the others, and then classify the sketches in that class for a more specific match against the query input.&lt;br /&gt;&lt;br /&gt;It seems like orientation may be important for distinguishing between many gestures. I might not want a sad face to be classified the same as a happy face. The authors mentioned transitioning to a grid-based approach if this was the case. I wonder if there is some sort of thresholding that can take place so that large changes in orientation count as differences, but small changes in orientation are ignored. This might allow for a bit of wobble in the user's drawing, but allow for large differences (most likely made on purpose) to be counted differently. Of course, this would be impossible, it seems, given the current method of rotating the image to make its major axis horizontal. Unless one could take the orientation features before the rotation.&lt;br /&gt;&lt;br /&gt;It also seems like a very difficult problem to extract images from an electronic journal, separating them from the other objects (other sketches, text, ...) that surround them on the digital page. In the future work this was mentioned as perceptual grouping and seems like a very difficult task. I can imagine scenarios where I draw a particular molecule for the chemistry e-notes. I then annotate the molecule with information like bond energies, etc. When I search for it later, I don't include the annotations, just the shape of the molecule. Is it possible to extract such information? Perhaps by using some sort of layering mechanism where the strokes are stored as layers on top of one another, so that the text annotations can be separated from the drawing of the molecule. Very difficult, indeed...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-4692236976356748302?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/4692236976356748302/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=4692236976356748302&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/4692236976356748302'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/4692236976356748302'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2007/09/marqs.html' title='MARQS'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-977487209337976633</id><published>2007-09-04T23:58:00.000-05:00</published><updated>2007-09-10T16:41:30.021-05:00</updated><title type='text'>Visual Similarity of Pen Gestures</title><content type='html'>A. Chris Long, Jr., et al., &lt;i&gt;Visual Similarity of Pen Gestures&lt;/i&gt;, 2000&lt;br /&gt;A. Chris Long, Jr., et al., &lt;i&gt;”Those Look Similar!” Issues in Automating Gesture Design Advice&lt;/i&gt;, 2001&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Long et al. set out to determine what makes gestures similar or dissimilar, based on human perception. Two sets of experiments were conducted to determine this. In the first experiment, 20 people were asked to look at 14 gestures in all possible groups of 3, picking the gesture from each triad that was the most dissimilar. The different gestures were of a wide variety of types and orientation. Geometrical properties of the gestures were taken into account and, using multi-dimensional scaling, were used to describe what made different gestures similar/dissimilar. Five dimensions of geometric attributes were selected, which were then mapped, using linear regression, to different features (i.e. Rubine features) of the gestures. The derived model correlated 0.74 with the experimental user perceptions of similarity.&lt;br /&gt;&lt;br /&gt;The second trial was set up to test the predictive model of the first experiment, as well as test how varying assorted features affected similarity (total absolute angle and aspect, length and area, and rotation and its related features). Experiments were performed as in the first trial, where test subjects were shown triads and asked to choose the object that was the most dissimilar from the other two. When the three sets of varying features were fit to a predictive model, it was discovered that the angle of the bounding box and the gesture’s alignment with the coordinate axes played the most significant roles in determining similarity. It was also discovered that the model generated in the first trial outperformed the predictive power of the model in this trial.&lt;br /&gt;&lt;br /&gt;Some of the more interesting results were that most of the similarities between gestures could be explained by a handful of features (that accounted for three of the dimensions in the MDS). The remaining MDS dimensions required many more features to describe. The authors attributed the difficulties of determining object similarity to the complexity of the models, the limitations on the amount of training data, and the subjectivity of a determination of similarity from one person to another.&lt;br /&gt;&lt;br /&gt;In the shorter paper, Long, et al., use the similarity prediction models (trained on even more data) to create a tool to assist creators of gesture sets in creating gestures that are deemed dissimilar by people (so that gestures are easier to remember, this task performs horribly) and dissimilar by the computer (so that gestures can be accurately identified). The advice, as it is called, on gestures that are too similar is given after the user trains the new gesture class, as opposed to as soon as the first gesture example is drawn, because future strokes may altar the average features of the gesture class enough to make it significantly dissimilar to the computer. Or, if the analysis deems the gesture to not be distinguishable by humans, the advice is given immediately because more examples won’t change the basic construct.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;First, one of the equations in the long Long paper is wrong. The formula for the Minkowski distance metric should be (in LaTeX-ese):&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;d_{ij}^p = \left( \sum_a^r | x_{ia} – x_{ja} | ^p \right) ^{1/p}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The paper is missing the (...)^{1/p} around the summation. Otherwise, Euclidean distance would be missing its square root.&lt;br /&gt;&lt;br /&gt;I thought it was interesting that while the log of the aspect is proportional to the actual aspect, the log of the aspect outperforms the plain aspect in determining similarity. This actually boggled me, as I’m used to treating things like log likelihoods and likelihoods like they’re basically equivalent (i.e. maximizing one obviously maximizes the other). So why don’t similarities in aspect land translate into similarities in log-aspect land?&lt;br /&gt;&lt;br /&gt;It annoys me when people whine about not having enough training data to model all the connections between similarities, etc. Isn’t that the point? If you had an infinite amount of training data, all these fancy algorithms you use would be pretty worthless. All you’d have to do is look at all your data and pick the right answer. It’s like having an Oracle to solve the ATM (accepting Turing machine) problem. If you have an Oracle, the problem just seems to disappear. The point of this exercise is that you don’t have enough data. You will never have enough data. Even if you had enough data, you’d need some magical way to handle it all in an efficient manner to extract any information out of it in a useful amount of time. Your methods and algorithms should deal with this shortcoming in a way that is appropriate to the domain.&lt;br /&gt;&lt;br /&gt;Other than that rant, it was neat to see how the authors used MDS to tie the psychological and very subjective portion of the experiment—determining gesture similarity—into something mathematically well defined and robust—a linear model fit with regression. I’m surprised that the models were able to do better than random at predicting which gestures were similar, given the complexity of the domain and the nuances in human judgment (where by nuances I mean silly fickleness). It was even more interesting to see that the majority of differences from one gesture to another could be accounted for by modeling only a small subset of the features.&lt;br /&gt;&lt;br /&gt;And lastly, for my pithy quip:&lt;br /&gt;One of these things is not like the others...One of these things does not belong...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-977487209337976633?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/977487209337976633/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=977487209337976633&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/977487209337976633'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/977487209337976633'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2007/09/blog-post.html' title='Visual Similarity of Pen Gestures'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-7822899147753211345</id><published>2007-08-30T16:51:00.000-05:00</published><updated>2007-08-30T17:40:46.310-05:00</updated><title type='text'>Specifying Gestures by Example</title><content type='html'>"Specifying Gestures by Example," Dean Rubine, 1991&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Rubine developed an application that allows for the development of collections of highly accurate (97% accuracy) sketch recognizers to be built and trained quickly and easily. More accurately, Rubine’s methods do not apply to sketches, as one might commonly think of them, but are limited to single stroke (without lifting the “pen” from the "paper”) gestures. This limitation prevents the use of complex strokes, forcing some unnatural methods to draw certain gestures (i.e. drawing the letter “x” without lifting the pen) and introducing a bit of a learning curve, but simplifies the computations needed to recognize gestures immensely.&lt;br /&gt;&lt;br /&gt;Built on top of his GRANDMA system, Rubine’s gesture-based drawing program (GDP) allows one to define gesture classes. A gesture class gives meaning to the gesture. For example, a gesture determined to belong to the “draw rectangle” gesture class will perform operations defined by this class. After defining a new gesture class, the user will input several examples of strokes used to denote that gesture, accounting for any variability in size/orientation of the stroke in the training examples. Semantic meanings can be defined for the gestures so operations can be performed once they are recognized.&lt;br /&gt;&lt;br /&gt;Rubine defines 13 features that are used to describe a stroke, including data like initial angle, size and dimensions of the bounding box around the stroke, the amount of rotation in the figure, etc; each computed from the sample points that make up the “line” drawn by the stroke (each sample point contains an x and y coordinate along with a timestamp).&lt;br /&gt;&lt;br /&gt;The features are used in a linear classifier to decide what gesture the features “look like.” Given a gesture class c, weights are computed for each feature in this class. The feature values for this stroke are multiplied by the weights for this class and summed together with a weight for this class in general. The class that gives the highest “sum of weights” is the class that the stroke most looks like, and the stroke is classified to that gesture. The weights are computed using the inverse covariance between features within a class. Inverse covariance is high when covariance is low, meaning independent features are weighted higher than highly correlated features. This makes sense because you don’t want highly correlated features to affect one another, you want clear-cut decisions. In the case that a stroke ambiguously looks like it can belong to more than one class, statistical methods can be employed to reject ambiguities that fall below a certain probability of being correct.&lt;br /&gt;&lt;br /&gt;To improve on his method, Rubine suggests eager recognition—resolving a stroke’s class as soon as it becomes unambiguous, and multi-finger recognition—using systems that recognize two strokes at the same time—for possible adaptation to multi-stroke gestures.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;The gesture recognition system is incredibly simple on many fronts. It’s simple in that one can set up new gesture classes with a few mouse clicks, simple in that only a couple dozen training examples are needed per class, simple in its computation of various features, and extremely simple in its use of a linear classifier to do its work. However, it is also simple in the types of gestures that it can interpret (only single stroke, no segmented gestures) and the number of gesture classes that it can handle. In spite of its simplicity, or perhaps because of it, GDP seems to fill a nice niche. For simple sets of classes that can be drawn decently with one stroke (for example, the Latin alphabet), this method is extremely accurate. Obviously it can’t be used for more complex things (differentiating between a single stroke arm chair versus a single stroke couch in a system for interior designers might be a bit ridiculous), but its domain is not complex strokes.&lt;br /&gt;&lt;br /&gt;One thing I was wondering about was regarding the eager recognition technique. Rabine mentions that a stroke is classified to its gesture class as soon as the classification becomes unambiguous. Without digging through the Internet to locate the referenced papers, I was trying to think of different ways he might do this. It seems like the easiest way would be to use the same probabilistic approach used to reject ambiguous classifications. Simply keep track of a percentage for each class, and when one class gets above some threshold (and the rest below that threshold) simply make the decision right then. It seems like you could do this fairly cheaply in terms of complexity (just constant-time updates to things such as total rotation or total distance, and simple recomputations of length from start to ending points).&lt;br /&gt;&lt;br /&gt;Also, to get free training examples, I wonder if Rabine adds successfully classified strokes into the training set (requiring a fresh computation for the mean feature values, covariance matrices, and the weights). This would give free data that the user hasn’t “undone” and might help to learn a particular user’s style of creating the strokes (over time the user’s subtle nuances would dominate the feature means and allow the recognition to evolve).&lt;br /&gt;&lt;br /&gt;And in final contemplation…Different strokes for different folks (kill me now, but I couldn’t resist the temptation).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-7822899147753211345?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/7822899147753211345/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=7822899147753211345&amp;isPopup=true' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/7822899147753211345'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/7822899147753211345'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2007/08/specifying-gestures-by-example.html' title='Specifying Gestures by Example'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-7983848959954239731</id><published>2007-08-30T07:47:00.000-05:00</published><updated>2007-08-30T07:49:49.599-05:00</updated><title type='text'>Sketchpad</title><content type='html'>“Sketchpad: A Man-Machine Graphical Communication System,” Ivan Sutherland, 1963&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;The Sketchpad system was a 1960s era computer that allowed for dynamic and precise graphical input using a light pen and a bank of buttons, knobs, and toggle switches. The extremely modal operation did not allow for context-driven interaction, each action required the pressing of a specific button or sequence of buttons, possibly in addition to actions with the light pen (the primary source of input).&lt;br /&gt;&lt;br /&gt;Sketchpad was fairly revolutionary in that it was one of the first systems to pave the way for the modern concepts of object oriented programming and design. Sketchpad allowed for drawings to be saved as symbols. These symbols could be opened in subsequent drawings as instances, or sub-pictures, and manipulated individually (scaling, rotating, etc., but not fundamentally changing the object’s definition). Constraints could also be placed on drawings to force relationships between objects to help beautify and control the precision of a drawing (making lines parallel, the same length, etc.). Changes to the base symbol resulted in identical changes to every stored instance of that symbol within a sub-picture.&lt;br /&gt;&lt;br /&gt;Components of similar type (lines, points, constraints, etc.) were stored in circular doubly-linked lists, called rings. Additionally, all objects with common traits (such as all line segments terminating at a common point) were linked together. These lists allowed for easy insertion of new objects, deletion of old object, and operations such as the merging of different objects (merging two points would modify the line segments that terminated at those points). The generic blocks stored in the linked lists provided easy management and extensibility of specific code, precursors to the specific ideas of polymorphism and inheritance. Constraints are also represented as objects in rings, connected to the symbols and variables they control. &lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;The obvious strengths of this paper, things that were groundbreaking and fairly revolutionary for their time, were the ideas that later became to be known formally as object oriented programming, polymorphism, and inheritance. It was interesting to read that one could manipulate a symbol's definition, for example changing the hexagons to fish scales, and the change would be reflected immediately in all instances of that symbol. The generic structure of objects so that the system was extensible was not only ahead of its time, but also fit the idea of inheritance and polymorphism.&lt;br /&gt;&lt;br /&gt;One of the weaknesses is the horrible complexity of the Sketchpad system. Its modal input system, with the incredible number of buttons, knobs, and toggle switches, makes the use of Sketchpad limited to only a few people beyond its creator. However, I don't think its fair to fault Sketchpad too much for this. At the time of its creation, 1963, the computers that existed didn't have incredibly complex operating systems or SDKs to allow for the creation of very flexible programs. Hardware was still very tied to the systems, the transistor was pretty much brand new (field effect transistors had not even been invented yet), and computer systems were fairly specifically oriented. And, despite the complexity of its interface, Sketchpad was able to do some fairly amazing graphics manipulation for its time.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-7983848959954239731?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/7983848959954239731/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=7983848959954239731&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/7983848959954239731'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/7983848959954239731'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2007/08/sketchpad.html' title='Sketchpad'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-9000295771921458426</id><published>2007-08-30T07:45:00.000-05:00</published><updated>2007-08-30T07:46:25.793-05:00</updated><title type='text'>Introduction to Sketch Recognition</title><content type='html'>“Introduction to Sketch Recognition,” Tracy Hammond and Kenneth Mock&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Pen-based interfaces can be extremely useful, intuitive, and accurate in many domains, much more than the current use of a mouse. Different systems that use pen-input include Tablet PCs, interactive white-/blackboards in classroom use, and personal data assistants (PDAs). All of these devices can either be passive, where a stylus is used on a touch screen, a “tap” being equivalent to a mouse click, or active , where a special pen uses electromagnetic signals to relay it’s position to the display, moving the cursor without having to tap the pen to the surface.&lt;br /&gt;&lt;br /&gt;One domain where pen-input systems using digital ink (saving a sketch as the raw sketch and not performing any recognition procedures to deduce meaning) or sketch recognition systems (trying to deduce meaning from drawings) excels is in the classroom. In conjunction with other software/hardware combinations such as digital projectors and screen capture/recording programs (including audio), pen-input systems can be used to add dynamic content to static lectures, provide a medium for distributing, receiving, and grading electronic version of homework assignments, and helps provide immediate feedback  within the classroom setting. The notes jotted on lecture slides can be saved as digital ink, whereas supplementary materials like chemical formulae, mathematical equations, and physics demonstrations can be drawn/written and passes into a sketch recognition program to build 3D molecule models, plot function graphs, and animate physical interactions.&lt;br /&gt;&lt;br /&gt;The FLUID framework (based on the LADDER and GUILD sketch recognition system), allows users to define new domains and systems to recognize new sets of sketches. Providing a method for obtaining pen-input data, the framework will assign meaning to the drawings based on the user’s specifications, and then optionally pass this data on to another program (such as a CAD system for manipulating physical models of real world objects).&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Discussion&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;I was skeptical about the authors’ claims that mice “do not have the natural feel of a pen, nor [do they] provide a pen’s accuracy.” Being a long time computer user and owning a high grade optical mouse, I am quite satisfied with my level of accuracy and comfort with a mouse. However, I realize I am the exception and that everyone is familiar with a pen and anyone that can write on paper can write on a pen-input device with very little training curve. Getting that accuracy out of a mouse takes quite a bit more use and practice. Additionally, after using both the Wacom monitors and Table PCs in the lab, I realize just how easy it is to use a pen-input device. Yes, it is much more intuitive than a mouse.&lt;br /&gt;&lt;br /&gt;One of the ideas I like most about this paper is the use of digital ink for white-/blackboard presentations and lecture annotations. While in class, I almost constantly wish for some easy method to capture the content of the blackboard. I can transcribe what the professor writes into my own notes, but again I’m left without the flexibility of editing, copying, pasting, and “time lapse” that digital ink could provide. The paper left me very excited about the possibilities of pen-input and sketch recognition systems.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-9000295771921458426?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/9000295771921458426/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=9000295771921458426&amp;isPopup=true' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/9000295771921458426'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/9000295771921458426'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2007/08/introduction-to-sketch-recognition.html' title='Introduction to Sketch Recognition'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3062307596356887437.post-779389472381467440</id><published>2007-08-29T14:44:00.000-05:00</published><updated>2007-08-30T07:46:47.981-05:00</updated><title type='text'>About Me</title><content type='html'>My &lt;b&gt;name&lt;/b&gt; is Joshua Johnston.&lt;br /&gt;&lt;br /&gt;I am a &lt;b&gt;first year PhD&lt;/b&gt; at Texas A&amp;M University with an MS Computer Science from Baylor University.&lt;br /&gt;&lt;br /&gt;My &lt;b&gt;email&lt;/b&gt; address is &lt;code&gt;myfirstname.mylastname at NEO-TAMU&lt;/code&gt; (or &lt;code&gt;jbjohns AT CSDL&lt;/code&gt;, which just forwards to my NEO account)&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_ReBMSji2hs4/RtXObA3LeVI/AAAAAAAAAAM/0_U61bSRjgU/s1600-h/joshBlogPhoto.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_ReBMSji2hs4/RtXObA3LeVI/AAAAAAAAAAM/0_U61bSRjgU/s320/joshBlogPhoto.jpg" alt="" id="BLOGGER_PHOTO_ID_5104212716177553746" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;My wife of 5 years was the Study Abroad Coordinator for the Computer Science Department at Baylor University for their trip to Shanghai, China. I got to tag along. This &lt;b&gt;photograph&lt;/b&gt; was taken in the garden at the base of the Oriental Pearl Tower, Asia's highest tower and the world's third highest tower.&lt;br /&gt;&lt;br /&gt;My &lt;b&gt;academic interests&lt;/b&gt; lie generally in the area of machine learning and artificial intelligence. My Master's thesis was specifically about unsupervised learning--applying mixture models of the exponential Dirichlet compound multinomial to cluster basic block vector data for the SimPoint application. Though still interested generally in ML and AI, I'd like to use the resources presented by a larger university and larger Computer Science department to see about applying ML and AI techniques to different areas, and to explore other areas not related to clustering to see if I might enjoy research in them.&lt;br /&gt;&lt;br /&gt;As stated, I have my MS Computer Science already, with my thesis on unsupervised learning, and also a BS Computer Science. I have a great deal of programming &lt;b&gt;experience&lt;/b&gt; in Java (language of personal choice, with many hours of use in both academia and industry) and MATLAB. I can code in C++ and am fairly fluent, but prefer Java (on a strictly I-use-it-more basis).&lt;br /&gt;&lt;br /&gt;I am &lt;b&gt;taking this class because&lt;/b&gt; there are a lot of opportunities to increase my ML/AI experience and apply the two fields to new domains (new to me). Plus, it's neat to see how the "magic" of a handwriting recognition system (for example) really works. One of the first applications I coded in my machine learning class was a classifier for handwritten zip-code digits. It's a hard problem, to be sure, and I'd like to see more information on what the state of the art is.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;From this class&lt;/b&gt;, I hope to gain a new insight into how the difficult problems associated with assigning meaning to user input are solved in this domain. I've seen firsthand how difficult this problem can be elsewhere. In a domain with so much variability and flexibility as sketch recognition, how does one cope with these difficulties?&lt;br /&gt;&lt;br /&gt;In &lt;b&gt;5 years&lt;/b&gt;, I will hopefully have finished my PhD and be earning a lot of money as a professor at some prestigious university. If I can't do that, I'll settle for making even more money in industry. :) Either way, I want to research and push the field and myself farther. I always want to learn, and I feel research is one of the best ways to do that (as opposed to slinging code from 8 to 5).&lt;br /&gt;&lt;br /&gt;In &lt;b&gt;10 years&lt;/b&gt;, I hope to be tenured, well published, and really in my stride as a contributing professional in my research area (whatever that turns out to be). If I end up in industry rather than academia, I hope the only difference is that I don't have to worry about tenure.&lt;br /&gt;&lt;br /&gt;When I'm not doing academic related things, &lt;b&gt;I like&lt;/b&gt; spending time with my wife. We enjoy hiking, camping, playing racquetball, and watching movies together. Since I'm such a computer nerd, I also enjoy playing video games. My wife...not so much. :)&lt;br /&gt;&lt;br /&gt;Fun Story: While in Shanghai, we had the unique experience of visiting a Chinese Wal-Mart. It was just that, an experience. For example, in their fresh seafood department, you could net-your-own catfish, eels, turtles, and frogs. In their fresh meat department, if you wanted a rack of spare ribs, you just picked one up out of the bin and chunked it in your buggy, plastic baggies or gloves were purely optional. Same thing with the bins of whole, defeathered chickens and chicken feet. It made me thankful for the USA, plastic wrap, and Styrofoam.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3062307596356887437-779389472381467440?l=jbjohns.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jbjohns.blogspot.com/feeds/779389472381467440/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3062307596356887437&amp;postID=779389472381467440&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/779389472381467440'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3062307596356887437/posts/default/779389472381467440'/><link rel='alternate' type='text/html' href='http://jbjohns.blogspot.com/2007/08/my-name-is-joshua-johnston.html' title='About Me'/><author><name>- D</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_ReBMSji2hs4/RtXObA3LeVI/AAAAAAAAAAM/0_U61bSRjgU/s72-c/joshBlogPhoto.jpg' height='72' width='72'/><thr:total>0</thr:total></entry></feed>
