Leveraging Vision, Force Sensing, and Language Feedback for Deformable Object Manipulation - Robotics Institute Carnegie Mellon University

Leveraging Vision, Force Sensing, and Language Feedback for Deformable Object Manipulation

Master's Thesis, Tech. Report, CMU-RI-TR-24-32, June, 2024

Abstract

Deformable object manipulation represents a significant challenge in robotics due to its complex dynamics, lack of low-dimensional state representations, and severe self-occlusions. This challenge is particularly critical in assistive tasks, where safe and effective manipulation of various deformable materials can significantly improve the quality of life for individuals with disabilities and address the growing needs of an aging society. This thesis studies both specific applications in robot-assisted dressing and a generic framework for reinforcement learning-based manipulation of deformable objects.

In the first part, we present a robot-assisted dressing system capable of handling diverse garments and human body shapes and arm poses using reinforcement learning. By employing partial point cloud observations, policy distillation, and guided domain randomization, this work demonstrates effective policy learning that generalizes across various real-world assistive dressing scenarios. To enhance the safety and comfort of the dressing system, we further propose a novel multi-modal learning framework with vision and force sensing. We combine a vision-based reinforcement learning policy trained in simulation with a force dynamics model trained with real robot data to infer actions that facilitate the dressing process without applying excessive force on the person.

In the second part, we propose a novel framework that automatically generates reward functions for agents to learn new tasks by leveraging feedback from vision language foundation models. Since our method only requires a text description of the task goal and the agent’s visual observations, bypassing the need for low-dimensional state representations of objects, it is particularly useful for deformable object manipulation tasks.

BibTeX

@mastersthesis{Sun-2024-141263,
author = {Zhanyi Sun},
title = {Leveraging Vision, Force Sensing, and Language Feedback for Deformable Object Manipulation},
year = {2024},
month = {June},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-24-32},
keywords = {Robot Learning, Deep Learning, Robotics Manipulation},
}