An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem

M. J. Streeter and S. F. Smith

Conference Paper, Proceedings of 21st National Conference on Artificial Intelligence (AAAI '06) and 18th Innovative Applications of Artificial Intelligence (IAAI '06), pp. 135 - 142, July, 2006

Abstract

We present an asymptotically optimal algorithm for the max variant of the k-armed bandit problem. Given a set of k slot machines, each yielding payoff from a fixed (but unknown) distribution, we wish to allocate trials to the machines so as to maximize the expected maximum payoff received over a series of n trials. Subject to certain distributional assumptions, we show that O(k ln(k/δ) ln(n)2/ε2) trials are sufficient to identify, with probability at least 1 , a machine whose expected maximum payoff is within of optimal. This result leads to a strategy for solving the problem that is asymptotically optimal in the following sense: the gap between the expected maximum payoff obtained by using our strategy for n trials and that obtained by pulling the single best arm for all n trials approaches zero as n → ∞.

BibTeX

@conference{Streeter-2006-120469,
author = {M. J. Streeter and S. F. Smith},
title = {An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem},
booktitle = {Proceedings of 21st National Conference on Artificial Intelligence (AAAI '06) and 18th Innovative Applications of Artificial Intelligence (IAAI '06)},
year = {2006},
month = {July},
pages = {135 - 142},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.