Exploration for Continually Improving Robots - Robotics Institute Carnegie Mellon University

Exploration for Continually Improving Robots

PhD Thesis, Tech. Report, CMU-RI-TR-24-62, September, 2024

Abstract

Data-driven learning is a powerful paradigm for enabling robots to learn skills. Current prominent approaches involve collecting large datasets of robot behavior via teleoperation or simulation, to then train policies. For these policies to generalize to diverse tasks and scenes, there is a large burden placed on constructing a rich initial dataset, which is bottle-necked by human labor required in collecting demonstrations or careful design of simulation assets and scenes. Can we instead enable robots to learn how to collect their own data for continual improvement? This thesis seeks to tackle this question of exploration, which directs how agents should act, leading to the discovery of useful behavior.

We first consider how to define exploration objectives even in the absence of rewards or demonstrations. To explore new goals, our key insight is that it is easier to identify action sequences that lead to some unknown goal state, than to generate the unknown goal directly. This is enabled by training a world model that can be used to measure the uncertainty of action sequences. For further efficiency for real world deployment, we decouple environment and agent-centric exploration. The former relates to incentivizing actions that lead to change in the visual features of objects which is often beneficial for manipulation tasks, and the latter to uncertainty of the robot’s internal world model.

Next, we ask how to enable generalist robot explorers, for diverse tasks. Our approach is to learn data-driven priors to structure the action space, using human videos. We learn visual affordances, which characterize how objects can be interacted with by hands or end-effectors, providing a very efficient search space for exploration. Further this shared affordance action space can be used to train a joint human-robot world model. The model is first pre-trained on diverse video of human hands performing various tasks, and then fine-tuned with very few robot exploration trajectories. We also study how to efficiently adapt internet-scale video diffusion models using gradient information of a given reward function, which can enable future applications that use such models for planning in robotics.

The third question we consider is how to enable greater autonomy for robot explorers. We do so using mobile manipulation systems, as their extended feasible task space and resetting ability allows for continual practice and improvement with minimal human involvement. We show this on a quadruped equipped with an arm that learns to move chairs, sweep trash and vertically stand up dust-pans via real-world RL, as well as a custom wheeled system that learns to open doors across various buildings on campus. Finally, orthogonal to the question of exploration, we discuss how to scale data collection for bimanual dexterous manipulation using low-cost high-fidelity teleoperation as well as procedural scene generation in simulation to learn neural motion planners for robot arms. This is to obtain better initial policies from which robots can explore.

BibTeX

@phdthesis{Mendonca-2024-143441,
author = {Russell Mendonca},
title = {Exploration for Continually Improving Robots},
year = {2024},
month = {September},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-24-62},
keywords = {Reinforcement Learning, Robot Learning, Manipulation},
}