Discovering Objects and Their Location in Images - Robotics Institute Carnegie Mellon University

Discovering Objects and Their Location in Images

Josef Sivic, Bryan Russell, Alexei A. Efros, Andrew Zisserman, and Bill Freeman
Conference Paper, Proceedings of (ICCV) International Conference on Computer Vision, Vol. 1, pp. 370 - 377, October, 2005


We seek to discover the object categories depicted in a set of unlabelled images. We achieve this using a model developed in the statistical text literature: probabilistic Latent Seman- tic Analysis (pLSA). In text analysis this is used to discover topics in a corpus using the bag-of-words document repre- sentation. Here we treat object categories as topics, so that an image containing instances of several categories is mod- eled as a mixture of topics. The model is applied to images by using a visual ana- logue of a word, formed by vector quantizing SIFT-like re- gion descriptors. The topic discovery approach successfully translates to the visual domain: for a small set of objects, we show that both the object categories and their approx- imate spatial layout are found without supervision. Per- formance of this unsupervised method is compared to the supervised approach of Fergus et al. [8] on a set of unseen images containing only one object per image. We also extend the bag-of-words vocabulary to include `doublets' which encode spatially local co-occurring re- gions. It is demonstrated that this extended vocabulary gives a cleaner image segmentation. Finally, the classifi- cation and segmentation methods are applied to a set of images containing multiple objects per image. These re- sults demonstrate that we can successfully build object class models from an unsupervised analysis of images.


author = {Josef Sivic and Bryan Russell and Alexei A. Efros and Andrew Zisserman and Bill Freeman},
title = {Discovering Objects and Their Location in Images},
booktitle = {Proceedings of (ICCV) International Conference on Computer Vision},
year = {2005},
month = {October},
volume = {1},
pages = {370 - 377},