Redefining the Perception-Action Interface: Visual Action Representations for Contact-Centric Manipulation - Robotics Institute Carnegie Mellon University
Loading Events

PhD Thesis Defense

August

25
Fri
Thomas Weng PhD Student Robotics Institute,
Carnegie Mellon University
Friday, August 25
3:00 pm to 5:00 pm
GHC 6501
Redefining the Perception-Action Interface: Visual Action Representations for Contact-Centric Manipulation

Abstract: 

In robotics, understanding the link between perception and action is pivotal. Typically, perception systems process sensory data into state representations like  segmentations and bounding boxes, which a planner uses to plan actions. However, this state estimation approach can fail in environments with partial observability, and in cases with challenging object properties like transparency and deformability.  Alternatively, visuomotor policies directly convert raw sensor input into actions, but they produce actions that are not grounded in contact, and perform poorly in unseen task configurations.

To address these shortcomings, we delve into visual action representations, where the perception system conveys the affordances permitted by the environment. These affordances represent potential interactions in an object-centric and contact-centric manner, reasoning about where to make contact with an object, how to approach contact points, and how to manipulate the object once contact is made. By leveraging contact-based affordances, this approach makes action planning more straightforward.

This thesis examines visual action representations for addressing visual and geometric challenges in manipulation. We introduce algorithms for cloth manipulation, starting with determining precise grasping points for edges and corners of cloth (IROS ’20). In subsequent work, we propose FabricFlowNet, a policy that predicts both where to grasp and how to fold for bimanual cloth folding (CoRL ’21). We explore tactile sensing as another affordance modality and train a tactile classifier for precise cloth layer grasping (IROS ’22). For grasping rigid objects, we devise a transfer learning method for transparent and specular objects (RA-L+ICRA ’20) and introduce Neural Grasp Distance Fields for 6-DOF grasping and motion planning (ICRA ’23).

Thesis Committee Members:

David Held, Chair
Oliver Kroemer
Shubham Tulsiani
Alberto Rodriguez (MIT)

More Information