Online learning of robot soccer free kick plans using a bandit approach

Juan Pablo Mendoza, Reid Simmons, and Manuela Veloso

Conference Paper, Proceedings of 26th International Conference on Automated Planning and Scheduling (ICAPS '16), pp. 504 - 508, June, 2016

Abstract

This paper presents an online learning approach for teams of autonomous soccer robots to select free kick plans. In robot soccer, free kicks present an opportunity to execute plans with relatively controllable initial conditions. However, the effectiveness of each plan is highly dependent on the adversary, and there are few free kicks during each game, making it necessary to learn online from sparse observations. To achieve learning, we first greatly reduce the planning space by framing the problem as a contextual multi-armed bandit problem, in which the actions are a set of pre-computed plans, and the state is the position of the free kick on the field. During execution, we model the reward function for different free kicks using Gaussian Processes, and perform online learning using the Upper Confidence Bound algorithm. Results from a physics-based simulation reveal that the robots are capable of adapting to various different realistic opponents to maximize their expected reward during free kicks.

BibTeX

@conference{Mendoza-2016-122735,
author = {Juan Pablo Mendoza and Reid Simmons and Manuela Veloso},
title = {Online learning of robot soccer free kick plans using a bandit approach},
booktitle = {Proceedings of 26th International Conference on Automated Planning and Scheduling (ICAPS '16)},
year = {2016},
month = {June},
pages = {504 - 508},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.