Towards Modular and Differentiable Autonomous Driving - Robotics Institute Carnegie Mellon University
Loading Events

PhD Thesis Defense

June

7
Tue
Xinshuo Weng Robotics Institute,
Carnegie Mellon University
Tuesday, June 7
2:00 pm to 4:00 pm
NSH 4305
Towards Modular and Differentiable Autonomous Driving

Abstract:

The classical “modular and cascaded” autonomy stack (object detection, tracking, trajectory prediction, then planning and control) has been widely used for interactive autonomous systems such as self-driving cars due to its interpretability and fast development cycle. In this thesis, we advocate the use of such a modular stack but improve its accuracy and robustness by tightly integrating modules via a differentiable stack.

First, we will talk about how to integrate object detection and tracking by innovating a graph-based social-aware representation that models object relations in both detection and tracking settings, and an automatic and dynamic detection selection mechanism to better filter detections for downstream tracking; (2) Then, we innovate two frameworks to better integrate tracking and prediction: a parallelized tracking and prediction framework to alleviate compounding errors between two modules, and a multi-hypothesis tracking and prediction framework to increase the robustness of prediction with respect to inputs with tracking errors; (3) Towards the full integration of detection, tracking, and prediction, we also propose an affinity-based prediction framework, which directly uses affinity matrices as inputs for prediction. By removing the error-prone data association step, error propagation can be further reduced in this framework.

Beyond following the same order in the perception-then-prediction stack, we then explore an inverted prediction-then-perception pipeline. By inverting the order, prediction is now performed on the input sensor data (e.g., point clouds), which does not require expensive labels for training and can improve scalability. To tackle the point cloud forecasting task in the first step of this pipeline, we first develop a deterministic LSTM autoencoder architecture for proof of concept, which however cannot deal with the inherent uncertainty of the future. Therefore, we further propose a conditional variational recurrent neural network to account for the future uncertainty in point cloud forecasting. Moreover, since learning to predict future sensor data can lead to predictive representations that encode the dynamics of the world state, we integrate self-supervised point cloud prediction into an end-to-end driving policy for autonomous driving, which shows state-of-the-art closed-loop performance on the CARLA driving challenge.

Thesis Committee Members:
Kris Kitani, Chair
Matthew P. O’Toole
Deva Ramanan
Marco Pavone, Stanford University

More Information