A Simple Distribution-Free Approach to the Max k-Armed Bandit Problem

M. J. Streeter and S. F. Smith

Conference Paper, Proceedings 12th International Conference on Principles and Practice of Constraint Programming (CP '06), pp. 560 - 574, September, 2006

Abstract

The max k-armed bandit problem is a recently-introduced online optimization problem with practical applications to heuristic search. Given a set of k slot machines, each yielding payoff from a fixed (but unknown) distribution, we wish to allocate trials to the machines so as to maximize the maximum payoff received over a series of n trials. Previous work on the max k-armed bandit problem has assumed that payoffs are drawn from generalized extreme value (GEV) distributions. In this paper we present a simple algorithm, based on an algorithm for the classical k-armed bandit problem, that solves the max k-armed bandit problem effectively without making strong distributional assumptions. We demonstrate the effectiveness of our approach by applying it to the task of selecting among priority dispatching rules for the resource-constrained project scheduling problem with maximal time lags (RCPSP/max).

BibTeX

@conference{Streeter-2006-120492,
author = {M. J. Streeter and S. F. Smith},
title = {A Simple Distribution-Free Approach to the Max k-Armed Bandit Problem},
booktitle = {Proceedings 12th International Conference on Principles and Practice of Constraint Programming (CP '06)},
year = {2006},
month = {September},
pages = {560 - 574},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.