Loading Events

VASC Seminar

March

23
Mon
Devi Parikh Assistant Professor Virginia Tech
Monday, March 23
3:00 pm to 5:00 pm
Words, Pictures, and Imagination

Event Location: NSH 1109
Bio: Devi Parikh is an Assistant Professor in the Bradley Department of Electrical and Computer Engineering at Virginia Tech (VT) and an Allen Distinguished Investigator of Artificial Intelligence. She leads the Computer Vision Lab at VT, and is also a member of the Virginia Center for Autonomous Systems (VaCAS) and the VT Discovery Analytics Center (DAC). Prior to this, she was a Research Assistant Professor at Toyota Technological Institute at Chicago (TTIC), an academic computer science institute affiliated with University of Chicago. She has held visiting positions at Cornell University, University of Texas at Austin, Microsoft Research, MIT and Carnegie Mellon University. She received her M.S. and Ph.D. degrees from the Electrical and Computer Engineering department at Carnegie Mellon University in 2007 and 2009 respectively. She received her B.S. in Electrical and Computer Engineering from Rowan University in 2005. Her research interests include computer vision, pattern recognition and AI in general and visual recognition problems in particular. Her recent work involves leveraging human-machine collaboration for building smarter machines. She has also worked on other topics such as ensemble of classifiers, data fusion, inference in probabilistic models, 3D reassembly, barcode segmentation, computational photography, interactive computer vision, contextual reasoning and hierarchical representations of images. She was a recipient of the Carnegie Mellon Dean’s Fellowship, National Science Foundation Graduate Research Fellowship, Outstanding Reviewer Award at CVPR 2012 and ECCV 2014, Marr Best Paper Prize awarded at the International Conference on Computer Vision (ICCV) in 2011, two Google Faculty Research Awards in 2012 and 2014, the 2014 Army Research Office (ARO) Young Investigator Program (YIP) award, the Allen Distinguished Investigator Award in Artificial Intelligence from the Paul G. Allen Family Foundation in 2014, and an Outstanding New Assistant Professor award from the College of Engineering at Virginia Tech in 2015.

Abstract: As Computer Vision and Natural Language Processing techniques are maturing, there is heightened activity in exploring the connection between images and language. In this talk, I will present several recent and ongoing projects in my lab that take a new perspective on problems like automatic image captioning, that are receiving a lot of attention lately. In particular, I will start by describing a new methodology for evaluating image captioning approaches. I will then discuss image specificity — a concept capturing the phenomenon that some images are specific and elicit consistent descriptions from people, while other images are ambiguous and elicit a wider variety of descriptions from different people. Rather than think of this variance as noise, we model this as a signal. We demonstrate that modeling image specificity results in improved performance in applications such as text-based image retrieval. I will then talk about our work on leveraging visual common sense for seemingly non-visual tasks such as textual fill-in-the-blanks or paraphrasing. We propose imagining the scene behind the text to solve these problems. The imagination need not be photorealistic; so we imagine the scene as a visual abstraction using clipart. We show that jointly reasoning about the imagined scene and the text results in improved performance at these textual tasks than reasoning about the text alone. Finally, I will introduce a new task that pushes the understanding of language and vision beyond automatic image captioning — Visual Question Answering (VQA). Not only does it involve computer vision and natural language processing, doing well at this task will require the machine to reason about visual and non-visual common sense, as well as factual knowledge bases. More importantly, it will require the machine to know when to tap what source of information. I will describe our ongoing efforts at collecting a first of its kind, large VQA dataset that will enable the community to explore this rich, challenging, and fascinating task, that pushes the frontier towards truly AI-complete problems.