[MSR Thesis Talk] Kitchen Robot Case Studies: Learning Manipulation Tasks from Human Video Demonstrations - Robotics Institute Carnegie Mellon University
Loading Events

MSR Thesis Defense

December

5
Tue
Dingkun Guo MSR Student Robotics Institute,
Carnegie Mellon University
Tuesday, December 5
3:30 pm to 5:00 pm
GHC 8102
[MSR Thesis Talk] Kitchen Robot Case Studies: Learning Manipulation Tasks from Human Video Demonstrations
Abstract: 
The vision of integrating a robot into the kitchen, capable of acting as a chef, remains a sought-after goal in robotics. Current robotic systems, mostly programmed for specific tasks, fall short in versatility and adaptability to a diverse culinary environment. While significant progress has been made in robotic learning, with advancements in behavior cloning, reinforcement learning, and recent strides in diffusion policies and transformers, the challenge remains to develop a robot that matches human capabilities in learning and generalizing across tasks, particularly in complex, unstructured real-world scenarios.
In the thesis, I focus on enabling robots to learn long-horizon manipulation tasks from a single human demonstration, with predefined primitives that are generalizable across similar objects and environments. We developed a system that can process RGBD video demonstrations to identify task-relevant key poses and frames using Segment Anything. We then addressed challenges for robots replicating human actions, such as collision and robot configuration limitations. To validate the effectiveness of our approach, we conducted experiments focusing on manual dishwashing. With one human demonstration in a lab kitchen, the method was tested under varied conditions in a standard home kitchen, differing in geometry and appearance from the learning environment.
Further, we broaden the scope of learning to more generalized data sources, particularly focusing on videos from unstructured environments like YouTube. By enabling the use of unseen videos as a source for specific robot learning tasks, we translated visual elements into physical constraints and goals in simulation, inferring physics of the tasks. We demonstrated the transferability of this learning methods to real-world scenarios with actual robots, on tasks including fruit cutting, dough manipulation, and pouring liquid.
Committee:
Prof. Chris Atkeson (co-chair)
Prof. Jeff Ichnowski (co-chair)
Prof. David Held
Jianren Wang