Inverse Optimal Heuristic Control for Imitation Learning
Abstract
One common approach to imitation learning is behavioral cloning (BC), which employs straight-forward supervised learning (i.e., classification) to directly map observations to controls. A second approach is inverse optimal control (IOC), which formalizes the problem of learning sequential decision-making behavior over long horizons as a problem of recovering a utility function that explains observed behavior. This paper presents inverse optimal heuristic control (IOHC), a novel approach to imitation learning that capitalizes on the strengths of both paradigms. It employs long-horizon IOC-style modeling in a low-dimensional space where inference remains tractable, while incorporating an additional descriptive set of BC-style features to guide a higher-dimensional overall action selection. We provide experimental results demonstrating the capabilities of our model on a simple illustrative problem as well as on two real world problems: turn-prediction for taxi drivers, and pedestrian prediction within an office environment.
BibTeX
@conference{Ratliff-2009-10187,author = {Nathan Ratliff and Brian D. Ziebart and Kevin Peterson and J. Andrew (Drew) Bagnell and Martial Hebert and Anind Dey and Siddhartha Srinivasa},
title = {Inverse Optimal Heuristic Control for Imitation Learning},
booktitle = {Proceedings of 12th International Conference on Artificial Intelligence and Statistics (AISTATS '09)},
year = {2009},
month = {April},
pages = {424 - 431},
keywords = {imitation learning, apprenticeship learning, inverse optimal control, behavioral cloning, planning, stochastic policies, people prediction, taxi route prediction},
}