Abstract:
Building a generalist robot capable of performing diverse tasks in unstructured environments remains a longstanding challenge. A recent trend in robot learning aims to address this by scaling up demonstration datasets for imitation learning. However, most large-scale robotics datasets are collected in the real-world, often via manual teleoperation. This process is labor-intensive, slow, hardware-dependent, and poses safety risks, limiting its scalability.
Physics-based Simulation offers a scalable, safe, and efficient alternative for generating large demonstration datasets. However, two major challenges remain: (1) large manual effort is required to design simulation assets, scenes, and create training supervision such as reward functions, and (2) the sim-to-real gap in both sensing and dynamics can hinder real-world deployment of simulation-trained policies.
In this thesis, we explore using simulation to generate large-scale datasets to learn robotic manipulation policies that generalize across diverse objects and environments, while addressing the above challenges. We will discuss the following three lines of work:
-
1. Large-Scale Sim2real Transfer: We show that policies trained on large-scale simulation data, when combined with the right policy representation and observation space, can transfer zero-shot to the real world and generalize to diverse scenarios. We demonstrate this on two complex manipulation tasks: robot-assisted dressing and articulated object manipulation.
-
2. Automating Simulation Dataset Generation: We explore how to automate the creation of large simulation dataset, including tasks, assets, scenes, and training supervisions, through a paradigm called generative simulation, and how we can generate complex reward functions using feedbacks from vision language foundation models.
-
3. Efficient Adaptation of Sim2Real Policy: No simulation is perfect. We have also studied how to use small amounts of additional real-world data to efficiently improve the performance and safety of simulation-trained policies.
Finally, we will propose two potential future work, and solicit feedback on these directions: 1) in-context imitation learning for fast and quick adaptation using only one demonstration; and 2) Multi-task 3D generalist policy learning via combining diverse and large simulation datasets.
Thesis Committee Members:
Zackory Erickson, co-chair
David Held, co-chair
Katerina Fragkiadaki
Chuang Gan, UMass Amherst and MIT-IBM Watson AI Lab
Dieter Fox, University of Washington and Nvidia