Online Bellman Residual Algorithms with Predictive Error Guarantees - Robotics Institute Carnegie Mellon University

Online Bellman Residual Algorithms with Predictive Error Guarantees

Conference Paper, Proceedings of 31st Conference on Uncertainty in Artificial Intelligence (UAI '15), pp. 852 - 861, July, 2015

Abstract

We establish a connection between optimizing the Bellman Residual and worst case long-term predictive error. In the online learning framework, learning takes place over a sequence of trials with the goal of predicting a future discounted sum of rewards. Our analysis shows that, to- gether with a stability assumption, any no-regret online learning algorithm that minimizes Bellman error ensures small prediction error. No statistical assumptions are made on the sequence of observations, which could be non-Markovian or even adversarial. Moreover, the analysis is independent of the particular form of function approximation and the particular (stable) no-regret approach taken. Our approach thus establishes a broad new family of provably sound algorithms for Bellman Residual-based learning and provides a generalization of previous worst-case result for minimizing predictive error. We investigate the potential advantages of some of this family both theoretically and empirically on benchmark problems.

BibTeX

@conference{Sun-2015-6001,
author = {Wen Sun and J. Andrew (Drew) Bagnell},
title = {Online Bellman Residual Algorithms with Predictive Error Guarantees},
booktitle = {Proceedings of 31st Conference on Uncertainty in Artificial Intelligence (UAI '15)},
year = {2015},
month = {July},
pages = {852 - 861},
}