Reward Shaping by Demonstration - Robotics Institute Carnegie Mellon University

Reward Shaping by Demonstration

Halit Bener Suay, Tim Brys, Matthew E. Taylor, and Sonia Chernova
Conference Paper, Proceedings of 2nd Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM '15), June, 2015

Abstract

Potential-based reward shaping is a theoretically sound way of incorporating prior knowledge in a reinforcement learning setting. While providing flexibility for choosing the potential function, under certain conditions this method guarantees the convergence of the final policy, regardless of the properties of the potential function. However, this flexibility of choice may cause confusion when making a design decision for a specific domain, as the number of possible candidates for a potential function can be overwhelming. Moreover, the potential function either can be manually designed, to bias the behavior of the learner, or can be recovered from prior knowledge, eg from human demonstrations. In this paper we investigate the efficacy of two different methods of using a potential function recovered from human demonstrations. Our first approach uses a mixture of Gaussian distributions generated by samples collected during demonstrations (Gaussian-Shaping), and the second approach uses a reward function recovered from demonstrations with Relative Entropy Inverse Reinforcement Learning (RE-IRL-Shaping). We present our findings in Cart-Pole, Mountain Car, and Puddle World domains. Our results show that Gaussian-Shaping can provide an efficient reward heuristic, accelerating learning through its ability to capture local information, and RE-IRL-Shaping can be more resilient to bad demonstrations. We report a brief analysis of our findings and we aim to provide a future reference for reinforcement learning agent designers who consider using reward shaping by human demonstrations.

BibTeX

@conference{Suay-2015-126563,
author = {Halit Bener Suay and Tim Brys and Matthew E. Taylor and Sonia Chernova},
title = {Reward Shaping by Demonstration},
booktitle = {Proceedings of 2nd Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM '15)},
year = {2015},
month = {June},
}