Online Bellman Residual and Temporal Difference Algorithms with Predictive Error Guarantees

Wen Sun and J. Andrew (Drew) Bagnell

Conference Paper, Proceedings of 25th International Joint Conference on Artificial Intelligence (IJCAI '16), pp. 4213 - 4217, July, 2016

View Publication

Abstract

We establish connections from optimizing Bellman Residual and Temporal Difference Loss to worst-case long-term predictive error. In the online learning framework, learning takes place over a sequence of trials with the goal of predicting a future discounted sum of rewards. Our first analysis shows that, together with a stability assumption, any no-regret online learning algorithm that minimizes Bellman error ensures small prediction error. Our second analysis shows that applying the family of online mirror descent algorithms on temporal difference loss also ensures small prediction error. No statistical assumptions are made on the sequence of observations, which could be non-Markovian or even adversarial. Our approach thus establishes a broad new family of provably sound algorithms and provides a generalization of previous worst-case results for minimizing predictive error. We investigate the potential advantages of some of this family both theoretically and empirically on benchmark problems.

BibTeX

@conference{Sun-2016-5496,
author = {Wen Sun and J. Andrew (Drew) Bagnell},
title = {Online Bellman Residual and Temporal Difference Algorithms with Predictive Error Guarantees},
booktitle = {Proceedings of 25th International Joint Conference on Artificial Intelligence (IJCAI '16)},
year = {2016},
month = {July},
pages = {4213 - 4217},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.