3:30 pm to 4:30 pm
GHC 8102
Abstract: Our visual world is extraordinarily varied and complex, but despite its richness, the space of visual data may not be that astronomically large. We live in a well-structured, predictable world, where cars almost always drive on roads, sky is always above the ground, and so on. As humans, the ability to learn this structure from prior experiences is essential to our visual perception. In fact, we effortlessly (and often unconsciously) employ this structure for perceiving and responding to our surroundings; a feat that still eludes our artificial systems. In this dissertation, we propose to discover and harness this structure to develop large-scale visual recognition systems.
In Part I, we present supervised recognition algorithms that can leverage these underlying regularities in our visual world. We propose effective models for object recognition that incorporate top-down contextual feedback and models that can leverage geometric-structure of objects. We also develop supervised learning and inference methods that exploit the structure offered by visual data and by a wide range of recognition tasks.
These supervised systems, limited by our ability to collect annotations, are confined to curated datasets. Therefore, in Part II, we propose to overcome this limitation by discovering structure in large amounts of visual data and incorporating it as constraints in large-scale semi-supervised learning algorithms to improve visual recognition systems.
Thesis Committee Members:
Abhinav Gupta, Chair
Martial Hebert
Deva Ramanan
Alexei A. Efros, University of California, Berkeley
Jitendra Malik, University of California, Berkeley