Visual Recognition with Humans in the Loop - Robotics Institute Carnegie Mellon University
Loading Events

VASC Seminar

March

26
Mon
Serge Belongie Professor UCSD
Monday, March 26
3:00 pm to 12:00 am
Visual Recognition with Humans in the Loop

Event Location: NSH 1305
Bio: Serge Belongie received the B.S. degree (with honor) in
Electrical Engineering from the California Institute of Technology in
1995 and the M.S. and Ph.D. degrees in Electrical Engineering and
Computer Sciences (EECS) at U.C. Berkeley in 1997 and 2000,
respectively. While at Berkeley, his research was supported by a
National Science Foundation Graduate Research Fellowship. He is also a
co-founder of Digital Persona, Inc., and the principal architect of
the Digital Persona fingerprint recognition algorithm. He is currently
a Professor in the Computer Science and Engineering Department at U.C.
San Diego. His research interests include computer vision and pattern
recognition. He is a recipient of the NSF CAREER Award and the Alfred
P. Sloan Research Fellowship. In 2004 MIT Technology Review named him
to the list of the 100 top young technology innovators in the world
(TR100).

Abstract: We present an interactive, hybrid human-computer method for
object classification. The method applies to classes of problems that
are difficult for most people, but are recognizable by people with the
appropriate expertise (e.g., animal species or airplane model
recognition). The classification method can be seen as a visual
version of the 20 questions game, where questions based on simple
visual attributes are posed interactively. The goal is to identify the
true class while minimizing the number of questions asked, using the
visual content of the image. Incorporating user input drives up
recognition accuracy to levels that are good enough for practical
applications; at the same time, computer vision reduces the amount of
human interaction required. The resulting hybrid system is able to
handle difficult, large multi-class problems with tightly-related
categories. We introduce a general framework for incorporating almost
any off-the-shelf multi-class object recognition algorithm into the
visual 20 questions game, and provide methodologies to account for
imperfect user responses and unreliable computer vision algorithms. We
evaluate the accuracy and computational properties of different
computer vision algorithms and the effects of noisy user responses on
a dataset of 200 bird species and on the Animals With Attributes
dataset. Our results demonstrate the effectiveness and practicality of
the hybrid human-computer classification paradigm.

This work is part of the Visipedia project, in collaboration with
Steve Branson, Catherine Wah, Florian Schroff, Boris Babenko, Peter
Welinder and Pietro Perona.