PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration

Yuda Song and Wen Sun

Conference Paper, Proceedings of (ICML) International Conference on Machine Learning, pp. 9801 - 9811, July, 2021

Abstract

Model-based Reinforcement Learning (RL) is a popular learning paradigm due to its potential sample efficiency compared to model-free RL. However, existing empirical model-based RL approaches lack the ability to explore. This work studies a computationally and statistically efficient model-based algorithm for both Kernelized Nonlinear Regulators (KNR) and linear Markov Decision Processes (MDPs). For both models, our algorithm guarantees polynomial sample complexity and only uses access to a planning oracle. Experimentally, we first demonstrate the flexibility and efficacy of our algorithm on a set of exploration challenging control tasks where existing empirical model-based RL approaches completely fail. We then show that our approach retains excellent performance even in common dense reward control benchmarks that do not require heavy exploration. Finally, we demonstrate that our method can also perform reward-free exploration efficiently. Our code can be found at https://github. com/yudasong/PCMLP

BibTeX

@conference{Song-2021-130980,
author = {Yuda Song and Wen Sun},
title = {PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration},
booktitle = {Proceedings of (ICML) International Conference on Machine Learning},
year = {2021},
month = {July},
pages = {9801 - 9811},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.