Algorithms for Learning Markov Field Policies

Abdeslam Boularias, Oliver Kroemer, and Jan Peters

Conference Paper, Proceedings of (NeurIPS) Neural Information Processing Systems, Vol. 2, pp. 2177 - 2185, December, 2012

View Publication

Abstract

We use a graphical model for representing policies in Markov Decision Processes. This new representation can easily incorporate domain knowledge in the form of a state similarity graph that loosely indicates which states are supposed to have similar optimal actions. A bias is then introduced into the policy search process by sampling policies from a distribution that assigns high probabilities to policies that agree with the provided state similarity graph, i.e. smoother policies. This distribution corresponds to a Markov Random Field. We also present forward and inverse reinforcement learning algorithms for learning such policy distributions. We illustrate the advantage of the proposed approach on two problems: cart-balancing with swing-up, and teaching a robot to grasp unknown objects.

BibTeX

@conference{Boularias-2012-112191,
author = {Abdeslam Boularias and Oliver Kroemer and Jan Peters},
title = {Algorithms for Learning Markov Field Policies},
booktitle = {Proceedings of (NeurIPS) Neural Information Processing Systems},
year = {2012},
month = {December},
volume = {2},
pages = {2177 - 2185},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.