Loading Events

PhD Thesis Defense

May

23
Thu
Shivam Vats PhD Student Robotics Institute,
Carnegie Mellon University
Thursday, May 23
1:00 pm to 2:30 pm
NSH 4305
Plan to Learn: Active Robot Learning by Planning

Abstract:
Robots need a diverse repertoire of capable motor skills to succeed in the open world. Such a skillset cannot be learned or designed purely on human initiative. In this thesis, we advocate for an active continual learning approach that enables robots to take charge of their own learning. The goal of an autonomously learning robot should be to actively acquire skills that help it reliably achieve its long-term objective while minimizing the cost of data collection. To this end, we propose a novel Plan to Learn (P2L) framework, where the robot solves a meta planning problem to decide what and how to learn. An action in this problem corresponds to the robot improving an existing skill or learning a new skill by collecting additional data. The solution to this problem provides the robot with a strategy to optimally use the available time, computation and human help towards achieving its overall mission. We formalize and study this idea from both a practical and a theoretical lens in two challenging robotics scenarios.

First, we explore how robots can plan to learn online as part of a collaborative human-robot team. We develop an optimal mixed integer programming-based planner Act, Delegate, or Learn (ADL) to decide which skills the robot should learn to reduce its teammate’s workload. Next, we explore multi-step tasks, such as opening a door and placing a book on a bookshelf, under state uncertainty. Our first algorithm MetaReasoning for Skill Learning (MetaReSkill) estimates a probabilistic model of skill improvement to predict how each skill would improve with additional training. This model is then used by our planner to identify and prioritize skills that are both easy to learn and most relevant to the overall task. Finally, we present RecoveryChaining to solve the P2L problem for recovery learning using reinforcement learning. RecoveryChaining is a hybrid approach for solving challenging manipulation tasks where a recovery policy is learned to robustify model-based controllers. Our approach learns both where and how to recover by leveraging a hybrid action space consisting of primitive robot actions and switch actions that transfer control to a model-based controller. We demonstrate the effectiveness of the P2L framework on a variety of practically motivated and challenging manipulation tasks both in simulation and in the real world.

This thesis is only a first step towards building autonomously learning robots that would one day be an integral part of human life. We sincerely hope that the developed framework and its instantiations on these manipulation tasks pave a way for further research.

Thesis Committee Members:
Maxim Likhachev, Co-chair
Oliver Kroemer, Co-chair
Reid Simmons
George Konidaris, Brown University

More Information