Online Bellman Residual Algorithms with Predictive Error Guarantees
Abstract
We establish a connection between optimizing the Bellman Residual and worst case long-term predictive error. In the online learning framework, learning takes place over a sequence of trials with the goal of predicting a future discounted sum of rewards. Our analysis shows that, to- gether with a stability assumption, any no-regret online learning algorithm that minimizes Bellman error ensures small prediction error. No statistical assumptions are made on the sequence of observations, which could be non-Markovian or even adversarial. Moreover, the analysis is independent of the particular form of function approximation and the particular (stable) no-regret approach taken. Our approach thus establishes a broad new family of provably sound algorithms for Bellman Residual-based learning and provides a generalization of previous worst-case result for minimizing predictive error. We investigate the potential advantages of some of this family both theoretically and empirically on benchmark problems.
BibTeX
@conference{Sun-2015-6001,author = {Wen Sun and J. Andrew (Drew) Bagnell},
title = {Online Bellman Residual Algorithms with Predictive Error Guarantees},
booktitle = {Proceedings of 31st Conference on Uncertainty in Artificial Intelligence (UAI '15)},
year = {2015},
month = {July},
pages = {852 - 861},
}