Active Reward Learning with a Novel Acquisition Function

Christian Daniel, Oliver Kroemer, Malte Viering, Jan Metz, and Jan Peters

Journal Article, Autonomous Robots, Vol. 39, No. 3, pp. 389 - 405, October, 2015

View Publication

Abstract

Reward functions are an essential component of many robot learning methods. Defining such functions, however, remains hard in many practical applications. For tasks such as grasping, there are no reliable success measures available. Defining reward functions by hand requires extensive task knowledge and often leads to undesired emergent behavior. We introduce a framework, wherein the robot simultaneously learns an action policy and a model of the reward function by actively querying a human expert for ratings. We represent the reward model using a Gaussian process and evaluate several classical acquisition functions from the Bayesian optimization literature in this context. Furthermore, we present a novel acquisition function, expected policy divergence. We demonstrate results of our method for a robot grasping task and show that the learned reward function generalizes to a similar task. Additionally, we evaluate the proposed novel acquisition function on a real robot pendulum swing-up task.

BibTeX

@article{Daniel-2015-112213,
author = {Christian Daniel and Oliver Kroemer and Malte Viering and Jan Metz and Jan Peters},
title = {Active Reward Learning with a Novel Acquisition Function},
journal = {Autonomous Robots},
year = {2015},
month = {October},
volume = {39},
number = {3},
pages = {389 - 405},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.