Carnegie Mellon University
9:45 am to 11:00 am
GHC 4405
Learning from demonstration is an intuitive approach to encoding complex behaviors in autonomous agents. Learners have shown success in challenging tasks like autonomous driving, aerial obstacle avoidance, and information gathering, through observation and mimicry alone. State of the art algorithms like Dataset Aggregation (DAgger) have made significant advances over traditional behavior cloning, demonstrating strong theoretical and empirical results. However, these methods typically impose large sampling burdens on experts which may restrict the type of demonstrators or problems that can be addressed.
In this work we propose a modified version of the DAgger algorithm aimed at reducing expert queries while maintaining learner performance. Randomly initialized policies typically have state distributions unlike those of the final policies, leading to wasted expert labeling especially early in training. By increasing the rate of policy updates we aim to collect more relevant labeled data with respect to the total number of queries. In addition, we implement several supervised active learning approaches as part of our query selection, allowing policy uncertainty to inform expert label queries. We demonstrate our algorithm on a variety of simulated robot manipulator and control tasks.
Committee:
William “Red” Whittaker (Advisor)
David Held
Wen Sunhanc