The Binomial Block Bootstrap Estimator for Evaluating Loss on Dependent Clusters - Robotics Institute Carnegie Mellon University

The Binomial Block Bootstrap Estimator for Evaluating Loss on Dependent Clusters

Matt Barnes and Artur Dubrawski
Conference Paper, Proceedings of 33rd Conference on Uncertainty in Artificial Intelligence (UAI '17), August, 2017

Abstract

In this paper, we study the non-IID learning setting where samples exhibit dependency within latent clusters. Our goal is to estimate a learner’s loss on new clusters, an extension of the out-of-bag error. Previously developed cross-validation estimators are well suited to the case where the clustering of observed data is known a priori. However, as is often the case in real world problems, we are only given a noisy approximation of this clustering, likely the result of some clustering algorithm. This subtle yet potentially significant issue afflicts domains ranging from image classification to medical diagnostics, where naive cross-validation is an optimistically biased estimator. We present a novel bootstrap technique and corresponding cross-validation method that, somewhat counterintuitively, injects additional dependency to asymptotically recover the loss in the independent setting.

BibTeX

@conference{Barnes-2017-121825,
author = {Matt Barnes and Artur Dubrawski},
title = {The Binomial Block Bootstrap Estimator for Evaluating Loss on Dependent Clusters},
booktitle = {Proceedings of 33rd Conference on Uncertainty in Artificial Intelligence (UAI '17)},
year = {2017},
month = {August},
}