Operant conditioning in Skinnerbots

Journal Article, Adaptive Behavior, Vol. 5, No. 3, pp. 219 - 247, 1997

View Publication

Abstract

nstrumental (or operant) conditioning, a form of animal learning, is similar to reinforcement learning (Watkins, 1989) in that it allows an agent to adapt its actions to gain maximally from the environment while being rewarded only for correct performance. However, animals learn much more complicated behaviors through instrumental conditioning than robots presently acquire through reinforcement learning. We describe a new computational model of the conditioning process that attempts to capture some of the aspects that are missing from simple reinforcement learning: conditioned reinforcers, shifting reinforcement contingencies, explicit action sequencing, and state space refinement. We apply our model to a task commonly used to study working memory in rats and monkeys—the delayed match-to-sample task. Animals learn this task in stages. In simulation, our model also acquires the task in stages, in a similar manner. We have used the model to train an RWI B21 robot.

BibTeX

@article{Touretzky-1997-16407,
author = {David S. Touretzky and Lisa Saksida},
title = {Operant conditioning in Skinnerbots},
journal = {Adaptive Behavior},
year = {1997},
month = {January},
volume = {5},
number = {3},
pages = {219 - 247},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.