Unlocking Generalization for Robotics via Scale and Modularity - Robotics Institute Carnegie Mellon University
Loading Events

PhD Thesis Defense

January

31
Fri
Murtaza Dalal PhD Student Robotics Institute,
Carnegie Mellon University
Friday, January 31
2:30 pm to 4:30 pm
GHC 4405
Unlocking Generalization for Robotics via Scale and Modularity
Abstract:
How can we build generalist robot systems? Looking at fields such as vision and language, the common theme has been large scale end-to-end learning with massive, curated datasets. In robotics, on the other hand, scale alone may not be enough due to the significant multimodality of robotics tasks, lack of easily accessible data and the safety and reliability challenges of deploying on physical hardware. Meanwhile, some of the most successfully deployed robotic systems today are inherently modular and can leverage the independent generalization capabilities of each module to perform well. Inspired by these qualities, this thesis seeks to tackle the task of building generalist robot agents by integrating these components into one: combining modularity with large-scale learning for general purpose robot control.

We begin by exploring these two aspects independently. The first question we consider is: how can we build modularity and hierarchy into learning systems? Our key insight is that rather than having the agent learn hierarchy and low-level control end-to-end, we can explicitly enforce modularity via planning to enable significantly more efficient and capable robot learners. Next, we come to the role of scale in building generalist robot systems. To effectively scale, neural networks require vast amounts of diverse data, expressive architectures to fit the data and a source of supervision to generate the data. To that end, we leverage a powerful supervision source: classical planning algorithms, which can generalize broadly, but are expensive to run and require access to perfect, privileged information to perform well in practice. We use these planning algorithms to supervise large-scale policy learning in simulation to produce generalist agents.

Finally, we consider how to unify modularity with large-scale policy learning to build autonomous real-world robot systems capable of performing zero-shot long-horizon manipulation. We propose to do so by tightly integrating key ingredients of modular high and mid-level planning, learned local control, procedural scene generation and large-scale policy learning for sim2real transfer. We demonstrate that this recipe can produce powerful results: a single, generalist agent can solve challenging long-horizon manipulation tasks in the real world, solely from text instruction.

Thesis Committee Members:
Ruslan Salakhutdinov (Chair)
Deepak Pathak
Dave Held
Shuran Song (Stanford University)
Ankur Handa (NVIDIA)