Loading Events

PhD Thesis Proposal

December

9
Fri
Stéphane Ross Carnegie Mellon University
Friday, December 9
10:00 am to 12:00 am
No-Regret Methods for Learning Sequential Predictions

Event Location: GHC 8102

Abstract: Sequential prediction problems arise commonly in many areas of robotics and information processing. For instance, in robot navigation tasks, autonomous robots rely on the ability to make a sequence of actions, given a sequence of observations revealed to them over time, in order to reach the desired goal location. Similarly, complex information processing tasks, such as structured prediction problems in natural language processing and computer vision, can often be achieved by constructing the desired output (e.g. object present at every pixel in an image) from a sequence of simpler interdependent predictions (e.g. predict the object at a pixel given predictions at neighboring pixels).


Learning predictors that can perform these sequential tasks has become an important component of modern robotic systems. Unfortunately, learning in such sequential problems is challenging as the executed predictor and data-generation process are inextricably intertwined. This often leads to a significant mismatch between the distribution of observed data during training (under the predictor used to generate training instances) and test executions (under the learned predictor). As a result, naively applying standard statistical learning methods can yield a predictor that performs well during training but performs badly at the task during test execution.


We address this issue by proposing a novel reduction of these hard learning tasks to online learning. This reduction allows to leverage existing no-regret algorithms to obtain learning procedures that are guaranteed to find good predictors for test execution. Our research agenda explores several variations of this reduction for different scenarios. In control, we demonstrate our method can be used for learning to imitate an expert performing the task (imitation learning) and learning a model of the dynamics (system identification) for synthesizing a (near-)optimal controller. In general structured prediction problems, we show that our method can be used to learn good predictors that can construct the structured output from a sequence of predictions. We investigate the theoretical properties of our approach and its application in a number of large-scale control and structured prediction problems. We present preliminary experimental results in the context of imitation learning in two electronic games, system identification for optimal control of a simulated helicopter, handwriting recognition, and scene understanding tasks such as 3D point cloud classification and 3D surface layout estimation from single images.


Further work is proposed in system identification for partially observable domains and inverse optimal control tasks, as well as for learning compressed models that can be used to efficiently solve optimal control problems. Additional applications on real robotic platforms (UAV and robotic arm) are proposed.

Committee:J. Andrew Bagnell, Chair

Geoffrey J. Gordon

Chris Atkeson

John Langford, Yahoo! Research