On the Chance Accuracies of Large Collections of Classifiers - Robotics Institute Carnegie Mellon University

On the Chance Accuracies of Large Collections of Classifiers

Mark Palatucci and Andrew Carlson
Conference Paper, Proceedings of (ICML) International Conference on Machine Learning, pp. 744 - 751, July, 2008

Abstract

We provide a theoretical analysis of the chance accuracies of large collections of classifiers. We show that on problems with small numbers of examples, some classifier can perform well by random chance, and we derive a theorem to explicitly calculate this accuracy. We use this theorem to provide a principled feature selection criterion for sparse, high-dimensional problems. We evaluate this method on microarray and fMRI datasets and show that it performs very close to the optimal accuracy obtained from an oracle. We also show that on the fMRI dataset this technique chooses relevant features successfully while another state-of-the-art method, the False Discovery Rate (FDR), completely fails at standard significance levels.

BibTeX

@conference{Palatucci-2008-10029,
author = {Mark Palatucci and Andrew Carlson},
title = {On the Chance Accuracies of Large Collections of Classifiers},
booktitle = {Proceedings of (ICML) International Conference on Machine Learning},
year = {2008},
month = {July},
pages = {744 - 751},
keywords = {order statistics, extreme value, feature selection, multiple hypothesis testing},
}