Context and Subcategories for Sliding Window Object Recognition

PhD Thesis, Tech. Report, CMU-RI-TR-12-17, Robotics Institute, Carnegie Mellon University, August, 2012

View Publication

Abstract

Object recognition is one of the fundamental challenges in computer vision, where the goal is to identify and localize the extent of object instances within an image. The current de facto standard for building high-performance object category detectors is the sliding window approach. This approach involves scanning an image with a ﬁxed-size rectangular window and applying a classiﬁer to the features extracted within the sub-image deﬁned by the window. In this thesis, we study two important factors inﬂuencing the performance of the approach. First is the role played by context, where information outside the sliding window is used to rescore the detections output by the local window classiﬁer. Context helps to suppress detections in regions that are less probable to contain an object and encourages those that are more plausible. In the ﬁrst part of this thesis, we enumerate different sources and uses of context, and comprehensively evaluate their role in a benchmark detection challenge. Our analysis demonstrates that carefully used contextual cues serve not only to improve performance of local classiﬁers, but also to make their error patterns more meaningful and reasonable. Our analysis also provides a basis for assessing the inherent limitations of the existing approaches as well as the speciﬁc problems that remain unsolved. The second factor is the role played by subcategories, where information within the sliding window is used to split the training data into smaller groups, for learning multiple classiﬁers to model the appearance of an object category. The smaller groups have reduced appearance diversity and thus lead to simpler classiﬁcation problems. In the second part of this thesis, we analyze different schemes to generate subcategories and ﬁnd that unsupervised feature-space clustering produces well-performing subcategory classiﬁers. Beyond performance gains, subcategories are attractive for their conceptual simplicity and computational tractability. For example, we ﬁnd that careful use of subcategories can potentially replace the need for deformable parts within the state-of-the-art deformable parts model detector for many object categories. Data fragmentation is an important problem associated with subcategory-based methods. We present a novel approach that circumvents this problem by allowing different subcategories to share each other’s training instances.

BibTeX

@phdthesis{Divvala-2012-7566,
author = {Santosh Kumar Divvala},
title = {Context and Subcategories for Sliding Window Object Recognition},
year = {2012},
month = {August},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-12-17},
keywords = {Object Category Detection, Sliding Window, Context, Subcategories},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.