Do We Need More Training Data or Better Models for Object Detection? - Robotics Institute Carnegie Mellon University

Do We Need More Training Data or Better Models for Object Detection?

Xiangxin Zhu, Carl Vondrick, Deva Ramanan, and Charless C. Fowlkes
Conference Paper, Proceedings of British Machine Vision Conference (BMVC '12), September, 2012

Abstract

Datasets for training object recognition systems are steadily growing in size. This paper investigates the question of whether existing detectors will continue to improve as data grows, or if models are close to saturating due to limited model complexity and the Bayes risk associated with the feature spaces in which they operate. We focus on the popular paradigm of scanning-window templates defined on oriented gradient features, trained with discriminative classifiers. We investigate the performance of mixtures of templates as a function of the number of templates (complexity) and the amount of training data. We find that additional data does help, but only with correct regularization and treatment of noisy examples or “outliers” in the training data. Surprisingly, the performance of problem domain-agnostic mixture models appears to saturate quickly (∼10 templates and ∼100 positive training examples per template). However, compositional mixtures (implemented via composed parts) give much better performance because they share parameters among templates, and can synthesize new templates not encountered during training. This suggests there is still room to improve performance with linear classifiers and the existing feature space by improved representations and learning algorithms.

BibTeX

@conference{Zhu-2012-121205,
author = {Xiangxin Zhu and Carl Vondrick and Deva Ramanan and Charless C. Fowlkes},
title = {Do We Need More Training Data or Better Models for Object Detection?},
booktitle = {Proceedings of British Machine Vision Conference (BMVC '12)},
year = {2012},
month = {September},
}