Image texture can be an important clue to the 3D structure of a scene. It can also confound certain algorithms, like stereo, if it is not recognized and explicitly accounted for. Until now, there has been no reliable means of detecting and exploiting regions of texture in images of realistic scenes.
Our Approach
We are addressing this problem using the space/frequency representation. This representation shows the local spatial frequency content of every point in the image. We compute it by computing the 2D Fourier power spectrum in a neighborhood around each pixel we choose to analyze. This is called the image spectrogram. It preserves the spatial coherence of the image that would otherwise be lost in a full power spectrum of the whole image. The spectrogram is similar to other space/frequency representations like the Wigner distribution, Gabor filters, and wavelets.
We have shown how the spectrogram of an image lets us easily analyze many disparate phenomena with the same representation. It is best-suited for phenomena that need to be described in terms of both spatial and frequency coordinates. Our early work demonstrated its usefulness for texture segmentation, shape-from-texture, and the analysis of aliasing, zoom, and blur. We have developed some of these initial ideas into an algorithm for segmenting and computing surface normals from multiple regions of image texture.
Segmentation and Shape from Texture
Most current texture research is aimed at either segmenting regions of similar texture or computing surface normals from texture gradients (shape-from-texture). The segmentation research is based on the assumption that textures in an image are flat and viewed from the front. Shape-from-texture research is based on the assumption that there is only one texture in the scene, or that the similarly textured regions have been segmented. Based on these mutually exclusive assumptions, it is impossible to segment and compute surface normals from realistic images, because they violate both assumptions. Our goal is to outline the regions of similar texture in spite of the deformations caused by 3D perspective effects.
We solved this problem using the image spectrogram. We begin by computing local surface normals based on shape-induced frequency shifts. As a textured surface recedes from the viewer, its frequencies appear higher. We have developed a mathematical relationship between the frequency shifts and the surface normal. For presegmented images, we can compute surface normals to within about four degrees. When we don’t know the texture boundaries, we can still get a rough estimate of the local surface normal by comparing frequency shifts between nearby points in the image.
In order to segment the textures, we merge image regions with similar texture. However, shape-induced frequency shifts can cause similar textures to appear quite different, and this often leads to a poor segmentation. We solve this problem by using the local surface normal estimates to undo the 3D perspective effects, giving "frontalized" versions of the textures’ power spectra. For each pair of neighboring regions, we make a tentative assumption that they are both from the same planar surface with the same texture. If their frontalized power spectra are similar, they are merged into one region. This merging continues until the textures are segmented. We use a novel "minimum description length" criterion for evaluating potential merges. The result is a segmented image along with the surface normals of the textured regions. We know of no other algorithm than can segment 3D textured surfaces by explicitly accounting for 3D shape effects.
Implications
The space/frequency representation has proven useful for solving the combined problem of segmentation and shape from texture. Other researchers here are using the same representation for stereo and depth-from-focus. We envision a gradual merging of image understanding algorithms based on the space/frequency representation – "The Unified Theory of Spatial Vision."