Loading Events

PhD Thesis Proposal

May

13
Fri
Daniel Maturana Carnegie Mellon University
Friday, May 13
9:30 am to 12:00 am
Semantic Mapping for Robotic Navigation and Exploration

Event Location: GHC 2109

Abstract: The last decade has seen remarkable advances in 3D perception for robotics. Advances in range sensing and SLAM now allow robots to easily acquire detailed 3D maps of their environment in real-time.

However, adaptive robot behavior requires an understanding the environment that goes beyond pure geometry. A step above purely geometric maps are so-called semantic maps, which incorporate task-oriented semantic labels in addition to 3D geometry. In other words, a map of *what* is *where*. This is a straightforward representation that allows robots to use semantic labels for navigation and exploration planning.

In this proposal we develop learning-based approaches for semantic mapping with image and range sensors. We make three main contributions.

In our first contribution, which is completed work, we developed VoxNet, a system for accurate and efficient semantic classification of 3D point cloud data. The key novelty in this system is the integration of volumetric occupancy maps with spatially 3D Convolutional Neural Networks (CNNs). The system showed state-of-the-art performance in 3D object recognition and helicopter landing zone detection.

In our second contribution, motivated by the complementary information in image and point cloud data, we propose a CNN architecture fusing both modalities. The architecture consists of two interconnected streams: a volumetric CNN stream for the point cloud data, and a more traditional 2D CNN stream for the image data. We will evaluate this architecture for the tasks of terrain classification and obstacle detection in an autonomous All Terrain Vehicle (ATV).

In the final contribution, we propose a semantic mapping system for intelligent information gathering on Micro Aerial Vehicles (MAVs). In pursuit of a lightweight solution, we forego active range sensing and use monocular imagery as our main data source. This leads to various challenges, as we now must infer *where* as well as *what*. We outline our plan to solve these challenges using monocular cues, inertial sensing, and other information available to the vehicle.

Committee:Sebastian Scherer, Chair

Martial Hebert

Abhinav Gupta

Raquel Urtasun, University of Toronto