Abstract:
Autonomous robot navigation in off-road environments currently presents a number of challenges. The lack of structure makes it difficult to handcraft geometry-based heuristics that are robust to the diverse set of scenarios the robot might encounter. Many of the learned methods that work well in urban scenarios require massive amounts of hand-labeled data, but the nuances of deciding where a robot can and cannot drive in off-road terrain make it difficult to label large-scale data the same way. Many state-of-the-art approaches instead leverage self-supervised methods in training, using either expert demonstrations or proprioceptive feedback, but often still require a lot of data and can be vulnerable to domain shifts.
We adopt a philosophy that learned methods for off-road driving should be both self-supervised and adaptive, such that the robot can learn online without a human in the loop. In this work we propose a method that leverages proprioceptive cues and pre-trained visual foundation models to rapidly adjust its understanding of its environment in real-time, eliminating the need for large-scale training data and hand-labels. Specifically, we introduce a framework that predicts costmaps, speedmaps, and uncertainty by associating incoming visual features with roughness experienced by the system. With just seconds of collected experience, our results demonstrate navigation performance with as few interventions as methods trained on 100-1000x more data, while travelling as quickly as possible within the constraints of rider comfort. Furthermore, we aim to reduce the barrier to entry to full-scale off-road driving research by presenting TartanDrive 2.0, a large multi-modal dataset geared towards self-supervised learning methods.
Committee:
Sebastian Scherer (advisor)
Wenshan Wang
Samuel Triest