Image interpretation, the ability to see and understand the three-dimensional world behind a two-dimensional image, goes to the very heart of the computer vision problem. The overall objective of this research project effort is, given a single image, to automatically produce a coherent interpretation of the depicted scene. On one level, such interpretation should include opportunistically recognizing known objects (e.g. people, houses, cars, trees) and known materials (e.g. grass, sand, rock, foliage) as well as their rough positions and orientations within the scene. But more than that, the goal is to capture the overall sense of the scene even if we do not recognize some of its constituent parts.
To address this extremely difficult task, we propose a novel framework that aims to jointly model the elements that make up a scene within the geometric context of the 3D space that they occupy. Because none of the measured quantities in the image — geometry, materials, objects and object parts, scene classes, camera pose, etc. — are reliable in isolation, they must all be considered together, in a coherent way. Having the geometric context representation will allow all the elements of the image to be physically “placed” within this contextual frame and will permit reasoning between them and their 3D environment in a joint optimization framework. During the timeframe of this project, we will develop such a framework which will allow a geometrically coherent semantic interpretation of a image to emerge.
This project is funded by NSF CAREER award IIS-0546547.