Multi-armed bandit algorithms for spare time planning of a mobile service robot

Max Korein and Manuela Veloso

Conference Paper, Proceedings of 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS '18), pp. 2195 - 2197, July, 2018

Abstract

We assume that service robots will have spare time in between scheduled user requests, which they could use to perform additional unrequested services in order to learn a model of users' preferences and receive reward. However, a mobile service robot is constrained by the need to travel through the environment to reach a user in order to perform a service for them, as well as the need to carry out scheduled user requests. We present modified versions of Thompson Sampling and UCB1, existing algorithms used in multi-armed bandit problems, which plan ahead considering the time and location constraints of a mobile service robot. We compare them to existing versions of Thompson Sampling and UCB1 and find that our modified planning algorithms outperform the original versions in terms of both reward received and the effectiveness of the model learned in a simulation.

BibTeX

@conference{Korein-2018-122718,
author = {Max Korein and Manuela Veloso},
title = {Multi-armed bandit algorithms for spare time planning of a mobile service robot},
booktitle = {Proceedings of 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS '18)},
year = {2018},
month = {July},
pages = {2195 - 2197},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.