Deep 3D Geometric Reasoning for Robot Manipulation - Robotics Institute Carnegie Mellon University
Loading Events

PhD Thesis Proposal

April

5
Fri
Benjamin Eisner PhD Student Robotics Institute,
Carnegie Mellon University
Friday, April 5
9:30 am to 11:00 am
GHC 4405
Deep 3D Geometric Reasoning for Robot Manipulation

Abstract:
To solve general manipulation tasks in real-world environments, robots must be able to perceive and condition their manipulation policies on the 3D world. These agents will need to understand various common-sense spatial/geometric concepts about manipulation tasks: that local geometry can suggest potential manipulation strategies, that policies should be invariant across choice of reference frame, that policies should adjust when object configurations are adjusted, etc. While these properties may one day be learned implicitly through large-scale data collection and online experience, this investigation explores learning algorithms and visual representations which can imbue agents with geometric reasoning capabilities in a generalizable way while learning only from a small number of demonstrations or examples.

We first explore how agents can learn generalizable 3D affordance representations for articulated objects such as doors, drawers, etc. We propose a set of 3D visual representations which describe the motion constraints for every point on an articulated object. We demonstrate that when trained on a small dataset of simulated articulated objects, our representations generalize zero-shot to novel instances of seen object categories, entirely unseen object categories, and even real-world sensor data. We will also describe ongoing work to generate robust affordance predictions under both aleatoric and epistemic uncertainty.

Next, we explore how agents can learn task-critical geometric relationships for object rearrangement tasks from a small number of demonstrations. We design a set of dense 3D representations which can learn correspondence relationships across objects, precisely extract desired rigid-body transformations using novel reasoning layers, and exhibit desirable invariance/equivariance properties under scene transformation. We will also describe ongoing work to scale this paradigm up to the large-scale RLBench manipulation benchmark, extend this reasoning paradigm to non-rigid objects, and incorporate high-dimensional goal proposal into a long-horizon planning framework.

Finally, we will propose two potential categories of remaining work, and solicit feedback on these directions: 1) connecting powerful 3D reasoning capabilities to policy learning; and 2) exploring how these representations can be efficiently learned by watching human demonstrations in the real world.

Thesis Committee Members:
David Held, Chair
Shubham Tulsiani
Oliver Kroemer
Jon Scholz, Google DeepMind
Yuke Zhu, The University of Texas at Austin