Name: Teaching Robots to Drive: Scalable Policy Improvement via Human Feedback
Start: 2024-09-06T10:00:00-04:00
End: 2024-09-06T11:30:00-04:00
Location: NSH 3305

Brian Yang PhD Student Robotics Institute,
Carnegie Mellon University

Friday, September 6
10:00 am to 11:30 am
NSH 3305

Teaching Robots to Drive: Scalable Policy Improvement via Human Feedback

Abstract:
A long-standing problem in autonomous driving is grappling with the long-tail of rare scenarios for which little or no data is available. Although learning-based methods scale with data, it is unclear that simply ramping up data collection will eventually make this problem go away. Approaches which rely on simulation or world modeling offer some relief, but building such models is very challenging and in itself an active area of research.

On the other hand, humans can learn to drive without millions of logged driving miles or the ability to precisely predict the trajectories of all dynamic actors in the scene. This suggests a potential alternative path to learning robust driving policies which does not rely on highly accurate world models or enormous driving datasets — one which leans into human preferences and expertise as an untapped source of supervision for training driving policies.

This thesis aims to make the case for human feedback as a rich signal for improving driving policies in a sample efficient manner without requiring high fidelity simulation. First, we propose a method for guiding driving policies at test-time using unseen black-box reward functions. We can then synthesize reward functions using natural language and optimize them online, allowing us to solve novel tasks zero-shot using only language supervision. Next, we show how driving policies can be fine-tuned offline using human preference data. By eliciting preferences over high-level intents, we can use human feedback to effectively relabel sub-optimal driving demonstrations and improve on-road driving performance. As future work, we aim to combine these two methods to finetune driving policies offline using natural language corrections, which should enable richer feedback over longer horizons and chain-of-thought distillation.

Thesis Committee Members:
Katerina Fragkiadaki, Co-chair
Jeff Schneider, Co-chair
Maxim Likhachev
Philipp Krähenbühl, The University of Texas at Austin

More Information

PhD Thesis Proposal

September

Event Navigation

PhD Thesis Proposal

September

Share This Event!

Event Navigation