Towards a model for mid-level feature representation of scenes - Robotics Institute Carnegie Mellon University

Towards a model for mid-level feature representation of scenes

Mariya Toneva, Elissa Aminoff, Abhinav Gupta, and Michael Tarr
Journal Article, Journal of Vision, Vol. 14, No. 10, August, 2014

Abstract

Never Ending Image Learner (NEIL) is a semi-supervised learning algorithm that continuously pulls images from the web and learns relationships among them. NEIL has classified over 400,000 images into 917 scene categories using 84 dimensions - termed "attributes". These attributes roughly correspond to mid-level visual features whose differential combinations define a large scene space. As such, NEIL's small set of attributes offers a candidate model for the psychological and neural representation of scenes. To investigate this, we tested for significant similarities between the structure of scene space defined by NEIL and the structure of scene space defined by patterns of human BOLD responses as measured by fMRI. The specific scenes in our study were selected by reducing the number of attributes to the 39 that best accounted for variance in NEIL's scene-attribute co-classification scores. Fifty scene categories were then selected such that each category scored highly on a different set of at most 3 of the 39 attributes. We then selected the two most representative images of the corresponding high-scoring attributes from each scene category, resulting in a total of 100 stimuli used. Canonical correlation analyses (CCA) was used to test the relationship between measured BOLD patterns within the functionally-defined parahippocampal region and NEIL's representation of each stimulus as a vector containing stimulus-attribute co-classification scores on the 39 attributes. CCA revealed significant similarity between the local structures of the fMRI data and the NEIL representations for all participants. In contrast, neither the entire set of 84 attributes nor 39 randomly-chosen attributes produced significant results using this CCA method. Overall, our results indicate that subsets of the attributes learned by NEIL are effective in accounting for variation in the neural encoding of scenes – as such they represent a first pass compositional model of mid-level features for scene representation.

Notes
Meeting abstract presented at 14th Annual VSS '14

BibTeX

@article{Toneva-2014-121576,
author = {Mariya Toneva and Elissa Aminoff and Abhinav Gupta and Michael Tarr},
title = {Towards a model for mid-level feature representation of scenes},
journal = {Journal of Vision},
year = {2014},
month = {August},
volume = {14},
number = {10},
}