Algorithms for Learning Markov Field Policies
Abstract
We use a graphical model for representing policies in Markov Decision Processes. This new representation can easily incorporate domain knowledge in the form of a state similarity graph that loosely indicates which states are supposed to have similar optimal actions. A bias is then introduced into the policy search process by sampling policies from a distribution that assigns high probabilities to policies that agree with the provided state similarity graph, i.e. smoother policies. This distribution corresponds to a Markov Random Field. We also present forward and inverse reinforcement learning algorithms for learning such policy distributions. We illustrate the advantage of the proposed approach on two problems: cart-balancing with swing-up, and teaching a robot to grasp unknown objects.
BibTeX
@conference{Boularias-2012-112191,author = {Abdeslam Boularias and Oliver Kroemer and Jan Peters},
title = {Algorithms for Learning Markov Field Policies},
booktitle = {Proceedings of (NeurIPS) Neural Information Processing Systems},
year = {2012},
month = {December},
volume = {2},
pages = {2177 - 2185},
}