An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem - Robotics Institute Carnegie Mellon University

An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem

M. J. Streeter and S. F. Smith
Conference Paper, Proceedings of 21st National Conference on Artificial Intelligence (AAAI '06) and 18th Innovative Applications of Artificial Intelligence (IAAI '06), pp. 135 - 142, July, 2006

Abstract

We present an asymptotically optimal algorithm for the max variant of the k-armed bandit problem. Given a set of k slot machines, each yielding payoff from a fixed (but unknown) distribution, we wish to allocate trials to the machines so as to maximize the expected maximum payoff received over a series of n trials. Subject to certain distributional assumptions, we show that O(k ln(k/δ) ln(n)2/ε2) trials are sufficient to identify, with probability at least 1 , a machine whose expected maximum payoff is within of optimal. This result leads to a strategy for solving the problem that is asymptotically optimal in the following sense: the gap between the expected maximum payoff obtained by using our strategy for n trials and that obtained by pulling the single best arm for all n trials approaches zero as n → ∞.

BibTeX

@conference{Streeter-2006-120469,
author = {M. J. Streeter and S. F. Smith},
title = {An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem},
booktitle = {Proceedings of 21st National Conference on Artificial Intelligence (AAAI '06) and 18th Innovative Applications of Artificial Intelligence (IAAI '06)},
year = {2006},
month = {July},
pages = {135 - 142},
}