Discovering and Leveraging Visual Structure for Large-scale Recognition - Robotics Institute Carnegie Mellon University

Discovering and Leveraging Visual Structure for Large-scale Recognition

PhD Thesis, Tech. Report, CMU-RI-TR-17-63, Robotics Institute, Carnegie Mellon University, August, 2017

Abstract

Our visual world is extraordinarily varied and complex, but despite its richness, the space of visual data may not be that astronomically large. We live in a well-structured, predictable world, where cars almost always drive on roads, sky is always above the ground, and so on. As humans, the ability to learn this structure from prior experiences is essential to our visual perception. In fact, we effortlessly (and often unconsciously) employ this structure for perceiving and responding to our surroundings; a feat that still eludes our computational systems. In this dissertation, we propose to discover and harness this structure to improve large-scale visual recognition systems.

In Part I, we present supervised recognition algorithms that can leverage these underlying regularities in our visual world. We propose effective models for object recognition that incorporate top-down contextual feedback and models that can leverage geometric-structure of objects. We also develop supervised learning and inference methods that exploit the structure offered by visual data and by a wide range of recognition tasks.

These supervised systems, limited by our ability to collect annotations, are confined to curated datasets. Therefore, in Part II, we propose to overcome this limitation by automatically discovering structure in large amounts of visual data and incorporating it as constraints in large-scale semi-supervised learning algorithms to improve visual recognition systems.

BibTeX

@phdthesis{Shrivastava-2017-27375,
author = {Abhinav Shrivastava},
title = {Discovering and Leveraging Visual Structure for Large-scale Recognition},
year = {2017},
month = {August},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-17-63},
}