Active Exploration in Dynamic Environments

Sebastian Thrun and K. Moeller

Conference Paper, Proceedings of (NeurIPS) Neural Information Processing Systems, pp. 531 - 538, December, 1991

View Publication

Abstract

Whenever an agent learns to control an unknown environment, two opposing principles have to be combined, namely: exploration (long-term optimization) and exploitation (short-term optimization). Many real-valued connectionist approaches to learning control realize exploration by randomness in action selection. This might be disadvantageous when costs are assigned to "negative experiences". The basic idea presented in this paper is to make an agent explore unknown regions in a more directed manner. This is achieved by a so-called competence map, which is trained to predict the controller's accuracy, and is used for guiding exploration. Based on this, a bistable system enables smoothly switching attention between two behaviors -- exploration and exploitation -- depending on expected costs and knowledge gain. The appropriateness of this method is demonstrated by a simple robot navigation task.

BibTeX

@conference{Thrun-1991-15854,
author = {Sebastian Thrun and K. Moeller},
title = {Active Exploration in Dynamic Environments},
booktitle = {Proceedings of (NeurIPS) Neural Information Processing Systems},
year = {1991},
month = {December},
pages = {531 - 538},
publisher = {Morgan Kaufmann},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.