Leveraging Vision, Force Sensing, and Language Feedback for Deformable Object Manipulation - Robotics Institute Carnegie Mellon University
Loading Events

MSR Thesis Defense

June

11
Tue
Zhanyi Sun MSR Student / Graduate Research Assistant Robotics Institute,
Carnegie Mellon University
Tuesday, June 11
10:00 am to 11:30 am
1305 Newell Simon Hall
Leveraging Vision, Force Sensing, and Language Feedback for Deformable Object Manipulation

Deformable object manipulation represents a significant challenge in robotics due to its complex dynamics, lack of low-dimensional state representations, and severe self-occlusions. This challenge is particularly critical in assistive tasks, where safe and effective manipulation of various deformable materials can significantly improve the quality of life for individuals with disabilities and address the growing needs of an aging society. This thesis studies both specific applications in robot-assisted dressing and a generic framework for reinforcement learning-based manipulation of deformable objects.

In the first part, we present a robot-assisted dressing system capable of handling diverse garments and human body shapes and arm poses using reinforcement learning. By employing partial point cloud observations, policy distillation, and guided domain randomization, this work demonstrates effective policy learning that generalizes across various real-world assistive dressing scenarios. To enhance the safety and comfort of the dressing system, we further propose a novel multi-modal learning framework with vision and force sensing. We combine a vision-based reinforcement learning policy trained in simulation with a force dynamics model trained with real robot data to infer actions that facilitate the dressing process without applying excessive force on the person.

In the second part, we propose a novel framework that automatically generates reward functions for agents to learn new tasks by leveraging feedback from vision language foundation models. Since our method only requires a text description of the task goal and the agent’s visual observations, bypassing the need for low-dimensional state representations of objects, it is particularly useful for deformable object manipulation tasks.

Committee:
Prof. David Held (chair)
Prof. Zackory Erickson (co-chair)
Prof. Andrea Bajcsy
Xianyi Cheng