Multi-armed bandit algorithms for spare time planning of a mobile service robot
Abstract
We assume that service robots will have spare time in between scheduled user requests, which they could use to perform additional unrequested services in order to learn a model of users' preferences and receive reward. However, a mobile service robot is constrained by the need to travel through the environment to reach a user in order to perform a service for them, as well as the need to carry out scheduled user requests. We present modified versions of Thompson Sampling and UCB1, existing algorithms used in multi-armed bandit problems, which plan ahead considering the time and location constraints of a mobile service robot. We compare them to existing versions of Thompson Sampling and UCB1 and find that our modified planning algorithms outperform the original versions in terms of both reward received and the effectiveness of the model learned in a simulation.
BibTeX
@conference{Korein-2018-122718,author = {Max Korein and Manuela Veloso},
title = {Multi-armed bandit algorithms for spare time planning of a mobile service robot},
booktitle = {Proceedings of 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS '18)},
year = {2018},
month = {July},
pages = {2195 - 2197},
}