Carnegie Mellon University
Abstract:
Manipulation of deformable objects challenges common assumptions made for rigid objects. Deformable objects have high intrinsic state representation and complex dynamics with high degrees of freedom, making it difficult for state estimation and planning.
The completed work can be divided into two parts. In the first part, we explore reinforcement learning (RL) as a framework for learning a policy directly from sensory observation such as images. However, RL is known for being sample inefficient, especially from high dimension input, and requires a reward function. To find a good representation to reduce the sample complexity, we propose a method that combines a set of self-supervised auxiliary tasks for learning a shared representation used also for the main RL task at hand. To be able to specify a reward function directly from high dimension observation, we propose to use an indicator reward function, that allows us to solve goal-reaching problems without access to the ground-truth state during training. In the second part, we focus on solving a range of deformable object manipulation (DOM) tasks. As a first step, we propose SoftGym, the first benchmark for deformable object manipulation, which includes manipulation of rope, cloth, and fluid. We show that directly using RL from images without any inductive bias performs poorly. We then aim to find a good intermediate representation for deformable objects. Specifically for the task of cloth manipulation, we propose to model the visible part of the cloth as a mesh and learn the mesh dynamics. We show such an explicit representation achieves better performance and generalization compared to using images or a latent representation.
The completed work focuses on short-horizon tasks, or with simple motion primitives like pick-and-place. In the proposed work, we go one step further and aim to solve the task of sequential manipulation of dough using tools, which requires reasoning in a longer horizon. As no obvious motion primitives can be used for solving the tasks, we propose to learn skills from a trajectory optimizer in a differentiable simulation, and then plan over the skills to solve the long-horizon task. Preliminary results show significant improvement over the RL baselines. We plan to improve over the current method by a few extensions, which include better training of the neural skill abstractors, using a particle-based representation, and transferring the skills to the real world.
Thesis Committee Members:
David Held, Chair
Abhinav Gupta
Deepak Pathak
Ken Goldberg, University of California, Berkeley