HyperMap: Compressed 3D Map for Monocular Camera Registration
Abstract
We address the problem of image registration to a compressed 3D map. While this is most often performed by comparing LiDAR scans to the point cloud based map, it depends on an expensive LiDAR sensor at run time and the large point cloud based map creates overhead in data storage and transmission. Recently, efforts have been underway to replace the expensive LiDAR sensor with cheaper cameras and perform 2D-3D localization. In contrast to the previous work that learns relative pose by comparing projected depth and camera images, we propose HyperMap, a paradigm shift from online depth map feature extraction to offline 3D map feature computation for the 2D-3D camera registration task through end-to-end training. In the proposed pipeline, we first perform offline 3D sparse convolution to extract and compress the voxelwise hypercolumn features for the whole map. Then at run-time, we project and decode the compressed map features to the rough initial camera pose to form a virtual feature image. A Convolutional Neural Network (CNN) is then used to predict the relative pose between the camera image and the virtual feature image. In addition, we propose an efficient occlusion handling layer, specifically designed for large point clouds, to remove occluded points in projection. Our experiments on synthetic and real datasets show that, by moving the feature computation load offline and compressing, we reduced map size by 87-94% while maintaining comparable or better accuracy.
BibTeX
@conference{Chang-2021-134105,author = {Ming-Fang Chang and Joshua G. Mangelson and Michael Kaess and Simon Lucey},
title = {HyperMap: Compressed 3D Map for Monocular Camera Registration},
booktitle = {Proceedings of (ICRA) International Conference on Robotics and Automation},
year = {2021},
month = {May},
pages = {11739 - 11745},
}