Discovering Objects and Their Location in Images
Abstract
We seek to discover the object categories depicted in a set of unlabelled images. We achieve this using a model developed in the statistical text literature: probabilistic Latent Seman- tic Analysis (pLSA). In text analysis this is used to discover topics in a corpus using the bag-of-words document repre- sentation. Here we treat object categories as topics, so that an image containing instances of several categories is mod- eled as a mixture of topics. The model is applied to images by using a visual ana- logue of a word, formed by vector quantizing SIFT-like re- gion descriptors. The topic discovery approach successfully translates to the visual domain: for a small set of objects, we show that both the object categories and their approx- imate spatial layout are found without supervision. Per- formance of this unsupervised method is compared to the supervised approach of Fergus et al. [8] on a set of unseen images containing only one object per image. We also extend the bag-of-words vocabulary to include `doublets' which encode spatially local co-occurring re- gions. It is demonstrated that this extended vocabulary gives a cleaner image segmentation. Finally, the classifi- cation and segmentation methods are applied to a set of images containing multiple objects per image. These re- sults demonstrate that we can successfully build object class models from an unsupervised analysis of images.
BibTeX
@conference{Sivic-2005-9329,author = {Josef Sivic and Bryan Russell and Alexei A. Efros and Andrew Zisserman and Bill Freeman},
title = {Discovering Objects and Their Location in Images},
booktitle = {Proceedings of (ICCV) International Conference on Computer Vision},
year = {2005},
month = {October},
volume = {1},
pages = {370 - 377},
}