An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem
Abstract
We present an asymptotically optimal algorithm for the max variant of the k-armed bandit problem. Given a set of k slot machines, each yielding payoff from a fixed (but unknown) distribution, we wish to allocate trials to the machines so as to maximize the expected maximum payoff received over a series of n trials. Subject to certain distributional assumptions, we show that O(k ln(k/δ) ln(n)2/ε2) trials are sufficient to identify, with probability at least 1 , a machine whose expected maximum payoff is within of optimal. This result leads to a strategy for solving the problem that is asymptotically optimal in the following sense: the gap between the expected maximum payoff obtained by using our strategy for n trials and that obtained by pulling the single best arm for all n trials approaches zero as n → ∞.
BibTeX
@conference{Streeter-2006-120469,author = {M. J. Streeter and S. F. Smith},
title = {An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem},
booktitle = {Proceedings of 21st National Conference on Artificial Intelligence (AAAI '06) and 18th Innovative Applications of Artificial Intelligence (IAAI '06)},
year = {2006},
month = {July},
pages = {135 - 142},
}