Exploration for Continually Improving Robots - Robotics Institute Carnegie Mellon University
Loading Events

PhD Thesis Defense

September

9
Mon
Russell Mendonca PhD Student Robotics Institute,
Carnegie Mellon University
Monday, September 9
2:30 pm to 4:00 pm
NSH 4305
Exploration for Continually Improving Robots

Abstract:
Data-driven learning is a powerful paradigm for enabling robots to learn skills. Current prominent approaches involve collecting large datasets of robot behavior via teleoperation or simulation, to then train policies. For these policies to generalize to diverse tasks and scenes, there is a large burden placed on constructing a rich initial dataset, which is bottle-necked by human labor required in collecting demonstrations or careful design of simulation assets and scenes. Can we instead enable robots to learn how to collect their own data for continual improvement? This thesis seeks to tackle this question of exploration, which directs how agents should act, leading to the discovery of useful behavior.

We first consider how to define exploration objectives even in the absence of rewards or demonstrations. To explore new goals, our key insight is that it is easier to identify action sequences that lead to some unknown goal state, than to generate the unknown goal directly. This is enabled by training a world model that can be used to measure the uncertainty of action sequences. For further efficiency for real world deployment, we decouple environment and agent-centric exploration. The former relates to incentivizing actions that lead to change in the visual features of objects which is often beneficial for manipulation tasks, and the latter to the uncertainty of the robot’s internal world model.

Next, we ask how to enable generalist robot explorers, for diverse tasks. Our approach is to learn data-driven priors to structure the action space, using human videos. We learn visual affordances, which characterize how objects can be interacted with by hands or end-effectors, providing a very efficient search space for exploration. Further this shared affordance action space can be used to train a joint human-robot world model.  The model is first pre-trained on diverse video of human hands performing various tasks, and then fine-tuned with very few robot exploration trajectories. We also study how to efficiently adapt internet-scale video diffusion model generation using gradient information of a given reward function, which can enable future applications that leverage such models for planning in robotics.

The third question we consider is how to enable greater autonomy for robot explorers. We do so using mobile manipulation systems, as their extended feasible task space and resetting ability allows for continual practice and improvement with minimal human involvement. We show this on a quadruped+arm system that learns to move chairs, sweep trash and vertically stand up dust-pans via real world RL, as well as a wheeled base+arm system that learns to open doors across various buildings on campus. Finally, orthogonal to the question of exploration, we discuss how to scale data collection for bimanual dexterous manipulation using low-cost high-fidelity teleoperation as well as procedural scene generation in simulation to learn neural motion planners for robot arms. This is to obtain better initial policies from which robots can explore.

Thesis Committee Members:
Deepak Pathak, Chair
Abhinav Gupta
Ruslan Salakhutdinov
Sergey Levine, UC Berkeley
Dorsa Sadigh, Stanford

More Information