Inference Machines: Parsing Scenes via Iterated Predictions - Robotics Institute Carnegie Mellon University
Loading Events

PhD Thesis Proposal

October

10
Mon
Daniel Munoz Carnegie Mellon University
Monday, October 10
10:00 am to 12:00 am
Inference Machines: Parsing Scenes via Iterated Predictions

Event Location: NSH 3305

Abstract: Semantic understanding of the environment is critical for many robotic tasks such as path planning, mapping, and object tracking. While important progress has been made, extracting a rich representation of the environment remains a challenging problem. This process includes not only recognizing individual objects but also understanding the contextual relations among the objects, leading to a full understanding of the environment.


The prevalent method to encode such relationships is with a joint probabilistic or energy-based model which enables one to naturally write down these interactions. Unfortunately performing inference over these expressive models leads to an NP-hard optimization problem which must be approximated and, consequently, poses theoretical and empirical difficulties when learning the model. Furthermore, using approximate inference on any learned model often leads to suboptimal predictions due to the approximations. As we ultimately care about predicting the correct labeling of an environment, and not necessarily learning a joint model of the data, we instead view the approximate inference process as a modular procedure that is directly trained in order to produce correct labelings.


This thesis proposes a framework for training inference procedures that parse scenes in an iterative manner and is applicable to many domains. The inference procedure is composed of simple modules and is structured in a way that integrates contextual cues and feature descriptors computed at multiple resolutions in the scene. Our experiments demonstrate that our approach leads to state-of-the-art performance on many datasets in different domains while being computationally more efficient than other inference techniques in both theory and practice. Building upon our framework, we exploit the modular nature of the inference procedure and propose extensions that utilize: 1) combined sensory information (images and 3-D point clouds), 2) temporal information, and 3) unlabeled data.

Committee:J. Andrew Bagnell, Co-chair

Martial Hebert, Co-chair

Takeo Kanade

Yann LeCun, New York University