The Binomial Block Bootstrap Estimator for Evaluating Loss on Dependent Clusters
Abstract
In this paper, we study the non-IID learning setting where samples exhibit dependency within latent clusters. Our goal is to estimate a learner’s loss on new clusters, an extension of the out-of-bag error. Previously developed cross-validation estimators are well suited to the case where the clustering of observed data is known a priori. However, as is often the case in real world problems, we are only given a noisy approximation of this clustering, likely the result of some clustering algorithm. This subtle yet potentially significant issue afflicts domains ranging from image classification to medical diagnostics, where naive cross-validation is an optimistically biased estimator. We present a novel bootstrap technique and corresponding cross-validation method that, somewhat counterintuitively, injects additional dependency to asymptotically recover the loss in the independent setting.
BibTeX
@conference{Barnes-2017-121825,author = {Matt Barnes and Artur Dubrawski},
title = {The Binomial Block Bootstrap Estimator for Evaluating Loss on Dependent Clusters},
booktitle = {Proceedings of 33rd Conference on Uncertainty in Artificial Intelligence (UAI '17)},
year = {2017},
month = {August},
}