Abstract:
The goal of this thesis is to discover AI processes that enhance cross-domain and cross-task generalization in intelligent robot agents. Unlike the dominant approach in contemporary robot learning, which pursues generalization primarily through scaling laws (increasing data and model size), we focus on identifying the best abstractions and representations in both perception and policy learning.
We begin by introducing a factorized policy learning scheme that leverages perception and understanding bottlenecks to create abstract scene and task representations. Building on these abstractions, the scheme plans potential solutions and guides policies to execute them.
In the following, we seek to improve both policy learning and the inference of abstractions. We first present 3D Diffuser Actor, a general 3D manipulation policy based on equivariant 3D representations. 3D Diffuser Actor reframes 3D manipulation as a denoising problem and achieves significant advancements across a wide range of tasks and benchmarks.
Next, we delve into perception models that provide useful abstractions for manipulation. We focus on memory-prompted neural networks for segmentation, which exhibit emergent properties such as unsupervised discovery of 3D correspondences across scenes. This approach facilitates rapid test-time adaptation without the need for weight updates.
Finally, we propose a general analogy-driven framework for robot problem-solving that extends memory prompting to manipulation tasks. Specifically, we explore how to automatically discover abstractions for novel tasks from a few demonstrations, which can then prompt generative-model-based planners and policies. In parallel, we introduce a policy-in-the-loop data generation pipeline designed to address corner cases that challenge our policies.
Thesis Committee Members:
Katerina Fragkiadaki, Chair
Yonatan Bisk
Shubham Tulsiani
Abhishek Gupta, University of Washington