Learning Robot Motion Control with Demonstration and Advice-Operators
Abstract
As robots become more commonplace within society, the need for tools to enable non-robotics-experts to develop control algorithms, or policies, will increase. Learning from Demonstration (LfD) offers one promising approach, where the robot learns a policy from teacher task executions. Our interests lie with robot motion control policies which map world observations to continuous low-level actions. In this work, we introduce Advice-Operator Policy Improvement (A-OPI) as a novel approach for improving policies within LfD. Two distinguishing characteristics of the A-OPI algorithm are data source and continuous state-action space. Within LfD, more example data can improve a policy. In A-OPI, new data is synthesized from a student execution and teacher advice. By contrast, typical demonstration approaches provide the learner with exclusively teacher executions. A-OPI is effective within continuous state-action spaces because high level human advice is translated into continuous-valued corrections on the student execution. This work presents a first implementation of the A-OPI algorithm, validated on a Segway RMP robot performing a spatial positioning task. A-OPI is found to improve task performance, both in success and accuracy. Furthermore, performance is shown to be similar or superior to the typical exclusively teacher demonstrations approach.
BibTeX
@conference{Argall-2008-10088,author = {Brenna Argall and Brett Browning and Manuela Veloso},
title = {Learning Robot Motion Control with Demonstration and Advice-Operators},
booktitle = {Proceedings of (IROS) IEEE/RSJ International Conference on Intelligent Robots and Systems},
year = {2008},
month = {September},
pages = {399 - 404},
}