3:00 pm to 4:00 pm
Abstract:
Accurate localization is essential for autonomous operation in many problem domains. This is most often performed by comparing LiDAR scans collected in real-time to a HD point cloud based map. While this enables centimeter-level accuracy, it depends on an expensive LiDAR sensor at run time. Recently, efforts have been underway to reduce cost by using cheaper cameras to perform 2D-3D localization. In contrast to previous work that learns relative pose by comparing projected depth and camera images, we propose HyperMap, a paradigm shift from online depth feature extraction to offline map feature computation for the 2D-3D camera registration task through end-to-end training. We refer to the decision to perform projection before feature extraction as “early projection”, and our approach which precomputes the 3D features on the map before projection as “late projection”.
We first perform offline 3D sparse convolution to extract and compress the voxelwise hypercolumn features for the whole map. Then at run-time, we project and decode the compressed map features to the rough initial camera pose to form a virtual feature image. A CNN is then used to predict the relative pose between the camera image and the virtual feature image. In addition, we propose a novel differentiable occlusion handling layer, especially designed for large points clouds, to remove occluded points in projection. Our experiments on synthetic and real datasets show that our method successfully reduced map size significantly while maintaining comparable or better performance.
Committee:
Simon Lucey (Advisor)
Michael Kaess (Coadviser)
Sebastian Scherer
Chen-Hsuan Lin