Obtaining depth information by actively controlling camera parameters is becoming more and more important in machine vision, because it is passive and monocular. Focus interpretation is a valuable alternative to stereo vision because it doesn’t require solving correspondence for depth recovery.
There are two distinct scenarios for using focus information for depth recovery: using focus and defocus information.
Depth From Focus
We try to determine distance to one point by taking many images in better and better focus. This is also called “autofocus” or “software focus".
The key problems in depth from focus have been the choice of the focus criterion and efficient peak detection from the focus criterion profile. We used the Tenengrad operator to measure focus quality because of its monotonicity and relatively sharp peak. But due to noise and other imperfections, the focus criterion profile usually displays a number of small ripples which may cause the traditional Fibonacci search to be trapped in local extrema. Based upon the observation that the ripples are small in scale, we developed a two-step peak detection method with a coarse Fibonacci search and fine-tuning by fitting a curve to the local focus criterion profile to find the real peak. Surprisingly, such a simple technique yields a great improvement of the performance, i.e. the precision of depth estimation from focus can be as high as 1/1000 when the target is 1.2m from the camera. Before our work, the best previously reported result was 1/200 at about 1m distance.
Depth From Defocus
By taking a small number of images under different lens parameters, we can determine depth at all points in the scene. This is a possible range image sensor, competing with laser range scanner or stereo vision. Best reported result is 1.3% RMS error in terms of distance from the camera when the target is about 0.9 m away.
The depth from defocus method uses the direct relationships among the depth, camera parameters and the amount of blurring in images to derive the depth from parameters which can be directly measured. The key problems are the measurement of difference of blurring amount and the calibration of the mapping between depth and the difference of blurring.
To preserve locality, we have to employ the windowed Fourier transform. But due to the spectral blurring introduced by the window, direct utilization of the Fourier magnitude information tends to have large errors. The maximal resemblance estimation eliminates the window effect by iteratively convolving the less blurred image with an artificial point spread function, whose spatial constant is the difference of blurring computed previously. Combined with proper thresholding of magnitude information to suppress the noise effect, the maximal resemblance estimation can quickly converge to very accurate estimations of blurring difference.
Combining this new method with an blurring model based on lens motor coordinates, we have demonstrated depth estimation precision from defocus at 1/200 precision when the target is 2.5m from the camera. The best previously reported result was about 1/77 at a distance of 0.9m.
Further Work
We are now working to blend information from defocus and stereo vision in a single system, to take maximum advantage of the complementary information provided by each. The key observation is that defocus computation is based on a difference of Fourier magnitude, while stereo disparity is based on Fourier phase, so that the two can be combined in a single framework. We are developing a new method for 3D model acquisition using this technique, so that the camera position (i.e. stereo) and lens parameters (i.e. focus) can be jointly optimized to get the most information from each new image.