Learning from demonstration for shaping through inverse reinforcement learning - Robotics Institute Carnegie Mellon University

Learning from demonstration for shaping through inverse reinforcement learning

Halit Bener Suay, Tim Brys, Matthew E. Taylor, and Sonia Chernova
Conference Paper, Proceedings of 15th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS '16), pp. 429 - 437, May, 2016

Abstract

Model-free episodic reinforcement learning problems define the environment reward with functions that often provide only sparse information throughout the task. Consequently, agents are not given enough feedback about the fitness of their actions until the task ends with success or failure. Previous work addresses this problem with reward shaping. In this paper we introduce a novel approach to improve modelfree reinforcement learning agents’ performance with a three step approach. Specifically, we collect demonstration data, use the data to recover a linear function using inverse reinforcement learning and we use the recovered function for potential-based reward shaping. Our approach is model-free and scalable to high dimensional domains. To show the scalability of our approach we present two sets of experiments in a two dimensional Maze domain, and the 27 dimensional Mario AI domain. We compare the performance of our algorithm to previously introduced reinforcement learning from demonstration algorithms. Our experiments show that our approach outperforms the state-of-the-art in cumulative reward, learning rate and asymptotic performance.

BibTeX

@conference{Suay-2016-126553,
author = {Halit Bener Suay and Tim Brys and Matthew E. Taylor and Sonia Chernova},
title = {Learning from demonstration for shaping through inverse reinforcement learning},
booktitle = {Proceedings of 15th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS '16)},
year = {2016},
month = {May},
pages = {429 - 437},
}