Learning Mobile Robot Motion Control from Demonstration and Corrective Feedback
Abstract
Fundamental to the successful, autonomous operation of mobile robots are robust motion control algorithms. Motion control algorithms determine an appropriate action to take based on the current state of the world. A robot observes the world through sensors, and executes physical actions through actuation mechanisms. Sensors are noisy and can mislead, however, and actions are non-deterministic and thus execute with uncertainty. Furthermore, the trajectories produced by the physical motion devices of mobile robots are complex, which make them difficult to model and treat with traditional control approaches. Thus, to develop motion control algorithms for mobile robots poses a signi cant challenge, even for simple motion behaviors. As behaviors become more complex, the generation of appropriate control algorithms only becomes more challenging. To develop sophisticated motion behaviors for a dynamically balancing di erential drive mobile robot is one target application for this thesis work. Not only are the desired behaviors complex, but prior experiences developing motion behaviors through traditional means for this robot proved to be tedious and demand a high level of expertise. One approach that mitigates many of these challenges is to develop motion control algorithms within a Learning from Demonstration (LfD) paradigm. Here, a behavior is represented as pairs of states and actions; more speci cally, the states encountered and actions executed by a teacher during demonstration of the motion behavior. The control algorithm is generated from the robot learning a policy, or mapping from world observations to robot actions, that is able to reproduce the demonstrated motion behavior. Robot executions with any policy, including those learned from demonstration, may at times exhibit poor performance; for example, when encountering areas of the state-space unseen during demonstration. Execution experience of this sort can be used by a teacher to correct and update a policy, and thus improve performance and robustness. The approaches for motion control algorithm development introduced in this thesis pair demonstration learning with human feedback on execution experience. The contributed feedback framework does not require revisiting areas of the execution space in order to provide feedback, a key advantage for mobile robot behaviors, for which revisiting an exact state can be expensive and often impossible. The types of feedback this thesis introduces range from binary indications of performance quality to execution corrections. In particular, advice-operators are a mechanism through which continuous-valued corrections are provided for multiple execution points. The advice-operator formulation is thus appropriate for low-level motion control, which operates in continuous-valued action spaces sampled at high frequency. This thesis contributes multiple algorithms that develop motion control policies for mobile robot behaviors, and incorporate feedback in various ways. Our algorithms use feedback to re ne demonstrated policies, as well as to build new policies through the sca olding of simple motion behaviors learned from demonstration. We evaluate our algorithms empirically, both within simulated motion control domains and on a real robot. We show that feedback improves policy performance on simple behaviors, and enables policy execution of more complex behaviors. Results with the Segway RMP robot con rm the e ectiveness of the algorithms in developing and correcting motion control policies on a mobile robot.
BibTeX
@phdthesis{Argall-2009-10180,author = {Brenna Argall},
title = {Learning Mobile Robot Motion Control from Demonstration and Corrective Feedback},
year = {2009},
month = {March},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-09-13},
}