
We begin by exploring these two aspects independently. The first question we consider is: how can we build modularity and hierarchy into learning systems? Our key insight is that rather than having the agent learn hierarchy and low-level control end-to-end, we can explicitly enforce modularity via planning to enable significantly more efficient and capable robot learners. Next, we come to the role of scale in building generalist robot systems. To effectively scale, neural networks require vast amounts of diverse data, expressive architectures to fit the data and a source of supervision to generate the data. To that end, we leverage a powerful supervision source: classical planning algorithms, which can generalize broadly, but are expensive to run and require access to perfect, privileged information to perform well in practice. We use these planning algorithms to supervise large-scale policy learning in simulation to produce generalist agents.
Finally, we consider how to unify modularity with large-scale policy learning to build autonomous real-world robot systems capable of performing zero-shot long-horizon manipulation. We propose to do so by tightly integrating key ingredients of modular high and mid-level planning, learned local control, procedural scene generation and large-scale policy learning for sim2real transfer. We demonstrate that this recipe can produce powerful results: a single, generalist agent can solve challenging long-horizon manipulation tasks in the real world, solely from text instruction.
Thesis Committee Members:
Ruslan Salakhutdinov (Chair)
Deepak Pathak
Dave Held
Shuran Song (Stanford University)
Ankur Handa (NVIDIA)