Abstract:
Robotics researchers have been attempting to extend data-driven breakthroughs in fields like computer vision and language processing into robot learning. However, unlike vision or language domains where massive amounts of data is readily available on the internet, training robotic policies relies on physical and interactive data collected via interacting with the physical world — a resource-intensive process limited by labor constraints. Such data scarcity has long been a major bottleneck in scaling up robot learning systems, constraining prior efforts to small-scale and task-specific settings.
In this talk, we present an ongoing paradigm shift that could potentially leads to general purpose robots by addressing existing limitations, and discuss in detail several fundamental components:
We present Generative Simulation, a generative framework for autonomously scaling up robotic data generation better leveraging the power of compute. Traditional policy training in simulation has long been hindered by extensive human effort in designing tasks, assets, environments, training supervisions, and evaluation metrics. We design a robotic agent that automates all stages of simulated robot learning — from initial task proposal to policy training — leading to diverse robotic demonstrations.
The simulation capability and efficiency desired by the stage above is beyond any existing simulation platforms. We introduce Genesis, a universal physics engine and simulation platform for embodied AI research, as a result of a two-year large-scale collaborative effort involving nearly 20 research labs worldwide. Genesis is developed in pure Python, while presenting the world’s fastest simulation speed (10-80x faster than existing GPU-accelerated robotics simulators such as Isaac Gym with a more accurate contact model), and is designed to be fully differentiable and optimized for user friendliness. Genesis unifies a range of physics solvers (MPM, FEM, PBD, ABD, SPH, etc.), supports a wide range of physical materials ranging from rigid and articulated bodies to various types of deformable bodies, integrates a world’s fastest photorealistic rendering engine, implements various useful sensor modalities such as physics-based tactile sensing, and natively supports generative simulation.
We present Act3D and ChainedDiffuser, a set of novel policy architectures for distilling robotic demonstration data into multi-task multi-modal policies, completing the cycle from data generation to effective policy training.
At the end, we will also briefly touch on the nature of robotic data generation, and discuss how to extend it to a broader concept — “Universal Data Generation”, where we generate a wide distribution of physically and visually accurate 3D dynamical worlds, and use them as a fundamental source to extract various modalities of data in a highly controllable manner, hinting its applications in numerous areas beyond robotics.
Thesis Committee Members:
Katerina Fragkiadaki, Chair
David Held
Deepak Pathak
Chuang Gan, UMass & MIT-IBM Lab
Shuran Song, Stanford