Learning to Manipulate Using Diverse Datasets

PhD Thesis, Tech. Report, CMU-RI-TR-24-30, July, 2024

View Publication

Abstract

Autonomous agents can play games (like Chess, Go, and even Starcraft), they can help make complex scientific predictions (e.g., protein folding), and they can even write entire computer programs, with just a bit of prompting. However, even the most basic physical manipulation skills, like unlocking and opening a door, still remain literally out-of-reach. The key challenge is acquiring the manipulation primitives themselves - there are infinite objects and environments in this world that a robot will have to interact with. Even worse, physics is unforgiving and even small errors can cause a task to fail entirely. In this thesis, I adopt a data-driven approach to address this challenge. Instead of hard-coding or planning actions within a known environment, I will explore methods/algorithms that acquire policies from increasingly scalable sources of offline data. The first section will demonstrate how highly effective policies can be learned from expert demonstrations and high-capacity neural networks. This work establishes the viability of data-driven policy learning for manipulation tasks, but requires the most expensive form of data to collect. Thus, the next part loosens the assumptions and demonstrates how diverse data can be collected across multiple institutions, and be used to boost performance even in specific domains/tasks. The third section pushes this same philosphy even further, and shows how human data can be used to improve robot policies. This is accomplished via representation learning - human data is used to learn robotic representations using various contrastive, self-supervised, and semi-supervised alogrithms that transfer strongly to downstream manipulation tasks. Finally, we look ahead and discuss how these methods can continue to scale by leveraging larger datasets and massive pre-trained vision/language models.

BibTeX

@phdthesis{Dasari-2024-141989,
author = {Sudeep Dasari},
title = {Learning to Manipulate Using Diverse Datasets},
year = {2024},
month = {July},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-24-30},
keywords = {Robot Learning, Manipulation, Perception},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.