Direct Multiple View Visual Simultaneous Localization And Mapping - Robotics Institute Carnegie Mellon University
Loading Events

PhD Thesis Proposal

August

12
Wed
Hatem Alismail Carnegie Mellon University
Wednesday, August 12
9:00 am to 12:00 am
Direct Multiple View Visual Simultaneous Localization And Mapping

Event Location: NSH 1305

Abstract: We propose a direct, featureless, Lucas-Kanade-based method as a reliable Visual Simultaneous Localization And Mapping (VSLAM) solution in challenging environments, where feature detection and precise subpixel localization may be unreliable. Current state-of-the-art direct methods have been shown to perform well on a range of challenging datasets. Nonetheless, they have been limited to relative pose estimation between two frames (Visual Odometry), and hence do not fully exploit the available information contained in a stream of imagery.

Extending direct methods to multiple views is a difficult problem because of the reliance on the brightness constancy assumption, which is seldom satisfied in robotic applications operating in unstructured scenes. Additionally — even if brightness constancy is satisfied — prior approaches to multiple view direct VSLAM either impose simplifying assumptions such as planarity of the world, or rely on the sub-optimal strategy of alternating optimization.

In this work, we propose a direct, joint optimization of the state vector over multiple views akin to the geometric Bundle Adjustment as commonly employed for optimal reconstruction from correspondences. The main difference, however, is that our framework operates on a function of image data directly without requiring precomputed, and fixed, correspondences. Instead, correspondences are estimated automatically by our algorithm as a byproduct of estimating the structure and motion.

To address the limitation of the brightness constancy assumption, we propose to borrow descriptors from feature-based methods and propose to perform direct alignment using “descriptor constancy.” This new descriptor constancy assumption is more robust with respect to appearance variations, without affecting the dimension of the state vector, or significantly increasing computational demands.

To this end, we consider binary descriptors as an illustrative case due to their computational efficiency and invariance to monotonic illumination changes. In particular, we demonstrate the use of the Census Transform (Local Binary Patterns). To address the non-smoothness, and discontinuity, of feature descriptors we propose a multi-channel representation, which allows us to (i) efficiently estimate the gradient of the objective, and (ii) to minimize the exact Hamming distance between binary descriptors using standard nonlinear least squares optimization algorithms.

We will evaluate our proposed Direct Multiple View VSLAM on a range of synthetic and real datasets with comparisons against state-of-the-art. We will also work towards a deeper understanding of the descriptor constancy assumptions and explore the use of other descriptors in the literature. Applications of descriptor constancy extend beyond VSLAM and can be used for other correspondence problems in vision such as template tracking and optical flow, which we will consider in our analysis and evaluation work.

Committee:Brett Browning, Co-chair

Simon Lucey, Co-chair

Michael Kaess

Martial Hebert

Ian Reid, University of Adelaide