Complete Cross-Validation for Nearest Neighbor Classifiers
Abstract
Cross-validation is an established technique for estimating the accuracy of a classifier and is normally performed either using a number of random test/train partitions of the data, or using k-fold cross-validation. We present a technique for calculating the complete cross-validation for nearest-neighbor classifiers: i.e., averaging over all desired test/train partitions of data. This technique is applied to several common classifier variants such as K-nearest-neighbor, stratified data partitioning and arbitrary loss functions. We demonstrate, with complexity analysis and experimental timing results, that the technique can be performed in time comparable to k-fold cross-validation, though in effect it averages an exponential number of trials. We show that the results of complete cross-validation are biased equally compared to subsampling and k-fold cross-validation, and there is some reduction in variance. This algorithm offers significant benefits both in terms of time and accuracy.
BibTeX
@conference{Mullin-2000-8050,author = {Matthew Mullin and Rahul Sukthankar},
title = {Complete Cross-Validation for Nearest Neighbor Classifiers},
booktitle = {Proceedings of (ICML) International Conference on Machine Learning},
year = {2000},
month = {June},
pages = {639 - 646},
keywords = {machine learning},
}