Multi-Armed Bandit Algorithms for a Mobile Service Robot’s Spare Time in a Structured Environment
Abstract
We assume that service robots will have spare time in between scheduled user requests, which they could use to perform additional unrequested services in order to learn a model of users’ preferences and receive reward. However, a mobile service robot is constrained by the need to travel through the environment to reach a user in order to perform a service for them, as well as the need to carry out scheduled user requests. We assume service robots operate in structured environments comprised of hallways and floors, which also affects the robot’s ability to plan and learn due to the existence of many scenarios where an office can be conveniently added to a plan at a low cost. We present two algorithms, Planning Thompson Sampling and Planning UCB1, which are based on existing algorithms used in multi-armed bandit problems but modified to plan ahead considering the time and location constraints of the problem. We compare them to existing versions of Thompson Sampling and UCB1 in two environments with structures representative of the types of structures a robot will encounter in an office building. We find that our planning algorithms outperform the original versions in terms of both reward received and the effectiveness of the model learned in a simulation, partially due to the fact that the original algorithms frequently miss opportunities to perform services at a low cost for convenient offices along their path.
BibTeX
@conference{Korein-2018-122714,author = {Max Korein and Manuela Veloso},
title = {Multi-Armed Bandit Algorithms for a Mobile Service Robot's Spare Time in a Structured Environment},
booktitle = {Proceedings of 4th Global Conference on Artificial Intelligence (GCAI '18)},
year = {2018},
month = {September},
pages = {121 - 133},
}