Online-Adaptive Self-Supervised Learning with Visual Foundation Models for Autonomous Off-Road Driving

Master's Thesis, Tech. Report, CMU-RI-TR-24-57, August, 2024

View Publication

Abstract

Autonomous robot navigation in off-road environments currently presents a number of challenges. The lack of structure makes it difficult to handcraft geometry-based heuristics that are robust to the diverse set of scenarios the robot might encounter. Many of the learned methods that work well in urban scenarios require massive amounts of hand-labeled data, but the nuances of deciding where a robot can and cannot drive in off-road terrain make it difficult to label large-scale data the same way. Many state-of-the-art approaches instead leverage self-supervised methods in training, using either expert demonstrations or proprioceptive feedback, but often still require a lot of data and can be vulnerable to domain shifts.

We adopt a philosophy that learned methods for off-road driving should be both self-supervised and adaptive, such that the robot can learn online without a human in the loop. In this work we propose a method that leverages proprioceptive cues and pre-trained visual foundation models to rapidly adjust its understanding of its environment in real-time, eliminating the need for large-scale training data and hand-labels. Specifically, we introduce a framework that predicts costmaps, speedmaps, and uncertainty by associating incoming visual features with roughness experienced by the system. Within seconds of collected experience, our results demonstrate navigation performance with as few interventions as methods trained on 100-1000x more data, while travelling as quickly as possible within the constraints of rider comfort. Furthermore, we aim to reduce the barrier to entry to full-scale off-road driving research by presenting TartanDrive 2.0, a large multi-modal dataset geared towards self-supervised learning methods.

BibTeX

@mastersthesis{Sivaprakasam-2024-142582,
author = {Matthew Sivaprakasam},
title = {Online-Adaptive Self-Supervised Learning with Visual Foundation Models for Autonomous Off-Road Driving},
year = {2024},
month = {August},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-24-57},
keywords = {self-supervised, online learning, dataset, multi-modal, off-road},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.