Covariant Policy Search - Robotics Institute Carnegie Mellon University

Covariant Policy Search

Conference Paper, Proceedings of 18th International Joint Conference on Artificial Intelligence (IJCAI '03), pp. 1019 - 1024, August, 2003

Abstract

We investigate the problem of non-covariant behavior of policy gradient reinforcement learning algorithms. The policy gradient approach is amenable to analysis by information geometric methods. This leads us to propose a natural metric on controller parameterization that results from considering the manifold of probability distributions over paths induced by a stochastic controller. Investigation of this approach leads to a covariant gradient ascent rule. Interesting properties of this rule are discussed, including its relation with actor-critic style reinforcement learning algorithms. The algorithms discussed here are computationally quite efficient and on some interesting problems lead to dramatic performance improvement over non-covariant rules.

BibTeX

@conference{Bagnell-2003-8727,
author = {J. Andrew (Drew) Bagnell and Jeff Schneider},
title = {Covariant Policy Search},
booktitle = {Proceedings of 18th International Joint Conference on Artificial Intelligence (IJCAI '03)},
year = {2003},
month = {August},
pages = {1019 - 1024},
keywords = {Reinforcement Learning, Reinforce, Natural gradient, covariant},
}