LIDAR and Monocular Camera Fusion: On-road Depth Completion for Autonomous Driving
Abstract
LIDAR and RGB cameras are commonly used sensors in autonomous vehicles. However, both of them have limitations: LIDAR provides accurate depth but is sparse in vertical and horizontal resolution; RGB images provide dense texture but lack depth information. In this paper, we fuse LIDAR and RGB images by a deep neural network, which completes a denser pixel-wise depth map. The proposed architecture reconstructs the pixel-wise depth map, taking advantage of both the dense color features and sparse 3D spatial features. We applied the early fusion technique and fine-tuned the ResNet model as the encoder. The designed Residual Up-Projection block recovers the spatial resolution of the feature map and captures context within the depth map. We introduced a depth feature tensor which propagates context information from encoder blocks to decoder blocks. Our proposed method is evaluated on the large-scale indoor NYUdepthV2 and KITTI odometry datasets, on which it outperforms the state-of-the-art single RGB image and depth fusion method. The proposed method is also evaluated on a reduced-resolution KITTI dataset which synthesizes the planar LIDAR and RGB image fusion.
BibTeX
@conference{Fu-2019-118749,author = {Chen Fu and Christoph Mertz and John M. Dolan},
title = {LIDAR and Monocular Camera Fusion: On-road Depth Completion for Autonomous Driving},
booktitle = {Proceedings of IEEE Intelligent Transportation Systems Conference (ITSC '19)},
year = {2019},
month = {October},
pages = {273 - 278},
keywords = {autonomous driving, perception, LIDAR, computer vision, depth completion, sensor fusion, deep learning},
}