Inverse Reinforcement Learning with Explicit Policy Estimates

Navyata Sanghvi, Shinnosuke Usami, Mohit Sharma, Joachim Groeger, and Kris Kitani

Conference Paper, Proceedings of 35th AAAI Conference on Artificial Intelligence (AAAI '21), pp. 8398 - 8400, February, 2021

Abstract

Various methods for solving the inverse reinforcement learning (IRL) problem have been developed independently in machine learning and economics. In particular, the method of Maximum Causal Entropy IRL is based on the perspective of entropy maximization, while related advances in the field of economics instead assume the existence of unobserved action shocks to explain expert behavior (Nested Fixed Point Algorithm, Conditional Choice Probability method, Nested Pseudo-Likelihood Algorithm). In this work, we make previously unknown connections between these related methods from both fields. We achieve this by showing that they all belong to a class of optimization problems, characterized by a common form of the objective, the associated policy and the objective gradient. We demonstrate key computational and algorithmic differences which arise between the methods due to an approximation of the optimal soft value function, and describe how this leads to more efficient algorithms. Using insights which emerge from our study of this class of optimization problems, we identify various problem scenarios and investigate each method's suitability for these problems.

BibTeX

@conference{Sanghvi-2021-126756,
author = {Navyata Sanghvi and Shinnosuke Usami and Mohit Sharma and Joachim Groeger and Kris Kitani},
title = {Inverse Reinforcement Learning with Explicit Policy Estimates},
booktitle = {Proceedings of 35th AAAI Conference on Artificial Intelligence (AAAI '21)},
year = {2021},
month = {February},
pages = {8398 - 8400},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.