Abstract:
Robots operating in the real world need fast and intelligent decision making systems. While these systems have traditionally consisted of human-engineered behaviors and world models, there has been a lot of interest in integrating them with data-driven components to achieve faster execution and reduce hand-engineering. Unfortunately, these learning-based methods require large amounts of training data that is often expensive and time-consuming to obtain in robotics. To address this, we formulate learning in a decision making system with many components as a resource allocation problem. Our objective is to optimise the performance of the system over a distribution of tasks under hard and soft constraints on resources, such as human demonstrations, computation, etc.
We first evaluate this idea in an offline setting to learn recovery skills for sequential manipulation. Skills learnt using popular techniques, such as, learning from demonstrations and reinforcement learning are usually quite brittle to failures induced by state uncertainty. Instead of naively training them on more data, we identify their failure modes in simulation and learn the corresponding recovery skills. In every training round, our resource allocation algorithm focuses resources on recoveries that are expected to improve the performance maximally. Next, we evaluate our approach in an online setting motivated by collaborative manufacturing with the goal of completing a given sequence of tasks using minimum human-robot effort. In such settings, a limited amount of online robot teaching can extend the robot’s capabilities and make it more useful. Given a sequence of tasks, we propose a cost-optimal planner Act, Delegate or Learn (ADL) that determines when to assign a task to a robot, when to assign it to a human and when to teach the robot.
Our first proposed work builds upon our recovery learning work to deal with partial observability more directly. The main limitation of our current approach is it ignores the uncertainty associated with a state during execution. While useful in practice, this approach cannot decide when to take information gathering actions instead of taking goal-reaching actions. To address this, we will formulate the problem as a Belief-Space MDP and learn recovery skills in the belief space. Our second proposed work extends our resource allocation approach to a new domain of motion planning for off-road navigation. These motion planners are quite slow as they rely on slow physics-based simulators as their forward dynamics model. Here, planning with a learnt dynamics model can speed up planning. However, learning a model that can cover the whole state space is intractable due to the slow simulator used to generate data. Instead, we propose to make efficient use of offline simulation to train the model only over the part of state space relevant to the task distribution at hand.
Thesis Committee Members:
Maxim Likhachev, Co-chair
Oliver Kroemer, Co-chair
Reid Simmons
George Konidaris, Brown University