Towards Better Interpretability in Deep Q-Networks

Raghuram Mandyam Annasamy and Katia P. Sycara

Conference Paper, Proceedings of 33rd National Conference on Artificial Intelligence (AAAI '19), pp. 4561 - 4569, January, 2019

Abstract

Deep reinforcement learning techniques have demonstrated superior performance in a wide variety of environments. As improvements in training algorithms continue at a brisk pace, theoretical or empirical studies on understanding what these networks seem to learn, are far behind. In this paper we propose an interpretable neural network architecture for Q-learning which provides a global explanation of the model's behavior using key-value memories, attention and reconstructible embeddings. With a directed exploration strategy, our model can reach training rewards comparable to the state-of-the-art deep Q-learning models. However, results suggest that the features extracted by the neural network are extremely shallow and subsequent testing using out-of-sample examples shows that the agent can easily overfit to trajectories seen during training.

BibTeX

@conference{Annasamy-2019-120831,
author = {Raghuram Mandyam Annasamy and Katia P. Sycara},
title = {Towards Better Interpretability in Deep Q-Networks},
booktitle = {Proceedings of 33rd National Conference on Artificial Intelligence (AAAI '19)},
year = {2019},
month = {January},
pages = {4561 - 4569},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.