Seeing the World Behind the Image: Spatial Layout for 3D Scene Understanding
Abstract
When humans look at an image, they see not just a pattern of color and texture, but the world behind the image. In the same way, computer vision algorithms must go beyond the pixels and reason about the underlying scene. In this dissertation, we propose methods to recover the basic spatial layout from a single image and begin to investigate its use as a foundation for scene understanding. Our spatial layout is a description of the 3D scene in terms of surfaces, occlusions, camera viewpoint, and objects. We propose a geometric class representation, a coarse categorization of surfaces according to their 3D orientations, and learn appearance-based models of geometry to identify surfaces in an image. These surface estimates serve as a basis for recovering the boundaries and occlusion relationships of prominent objects. We further show that simple reasoning about camera viewpoint and object size in the image allows accurate inference of the viewpoint and greatly improves object detection. Finally, we demonstrate the potential usefulness of our methods in applications to 3D reconstruction, scene synthesis, and robot navigation. Scene understanding from a single image requires strong assumptions about the world. We show that the necessary assumptions can be modeled statistically and learned from training data. Our work demonstrates the importance of robustness through a wide variety of image cues, multiple segmentations, and a general strategy of soft decisions and gradual inference of image structure. Above all, our work manifests the tremendous amount of 3D information that can be gleaned from a single image. Our hope is that this dissertation will inspire others to further explore how computer vision can go beyond pattern recognition and produce an understanding of the environment.
BibTeX
@phdthesis{Hoiem-2007-9791,author = {Derek Hoiem},
title = {Seeing the World Behind the Image: Spatial Layout for 3D Scene Understanding},
year = {2007},
month = {August},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-07-28},
}