Richer descriptions for images: Attributes, Sentences, and Phrases - Robotics Institute Carnegie Mellon University
Loading Events

VASC Seminar

December

1
Wed
Ali Farhadi PhD Candidate University of Illinois - Urbana-Champaign
Wednesday, December 1
4:30 pm to 12:00 am
Richer descriptions for images: Attributes, Sentences, and Phrases

Event Location: Intel Main Conference Room 4th Floor, CIC Building
Bio: Ali Farhadi is a Ph.D. candidate in the Computer Science Department at
the University of Illinois at Urbana-Champaign. His work, under the
supervision of David Forsyth, is mainly focused on computer vision and
machine learning. More specifically, he is interested in transfer
learning and its application to aspect issues in human activity and
object recognition, scene understanding, and attribute based
representation of objects. Ali has been awarded the inaugural Google
fellowship in computer vision and image interpretation and the
University of Illinois CS/AI 2009 award.

Abstract: In this talk I will cover three different approaches to provide richer
descriptions for images. First, we use visual attributes to find and
describe objects within broad domains. We describe objects by the
spatial arrangement of their attributes and the interactions between
them. Our system can find objects that it has not seen and infer
attributes such as function and pose. Our experiments demonstrate that
we can more reliably locate and describe both familiar and unfamiliar
objects, compared to a baseline that relies purely on basic category
detectors. Second, we describe images by sentences. Our system computes
a score linking an image to a sentence. This score can be used to attach
a descriptive sentence to a given image, or to obtain images that
illustrate a given sentence. Third, we use visual phrases, complex
visual composites like “a person riding a horse”, to describe images.
Visual phrases often display significantly reduced visual complexity
compared to their component objects. We show that a visual phrase
detector significantly outperforms a baseline which detects component
objects and reasons about relations. We introduce a novel decoding
procedure that can account accurately for local context without solving
difficult inference problems. We show that decoding a combination of
phrasal and object detectors produces real improvements.