Carnegie Mellon University
Abstract:
Modern planning methods are effective in computing feasible and optimal plans for robotic tasks when given access to accurate dynamical models. However, robots operating in the real world often face situations that cannot be modeled perfectly before execution. Thus, we only have access to simplified but potentially inaccurate models. This imperfect modeling can lead to highly suboptimal plans or even the inability to reach the goal during execution. Existing approaches present a learning-based solution where real-world experience is used to learn a complex dynamical model that is subsequently used for planning. However, this requires a prohibitively large amount of experience over the entire state space, and can be wasteful if we are interested in completing the task and not in modeling the dynamics accurately. Furthermore, real robots often have operating constraints and cannot spend hours acquiring experience to learn dynamics. This thesis argues that by updating the behavior of the planner and not the dynamics of the model, we can leverage simplified and potentially inaccurate models and significantly reduce the amount of real-world experience needed to provably guarantee that the robot completes the task.
In completed work, we proposed two approaches in support of this argument. The first approach CMAX guarantees that the robot reaches the goal using the inaccurate model without any resets. This is achieved by biasing the planner away from transitions whose dynamics are discovered to be inaccurately modeled during online execution. However, CMAX requires strong assumptions on the accuracy of the model used for planning and fails to improve the quality of solution over repetitions of the same task. The second approach CMAX++ leverages real-world experience to improve the quality of resulting plans over successive repetitions of a robotic task. CMAX++ achieves this by integrating model-free learning using acquired experience with model-based planning using the potentially inaccurate model. As a consequence of this in addition to completeness, CMAX++ also guarantees asymptotic convergence to the optimal path cost as the number of repetitions increases under relaxed assumptions. Crucially, both approaches do not require any updates to the dynamics of the model unlike any existing method for planning using inaccurate models.
For the remainder of the thesis, we propose to combine the advantages of existing methods that update the dynamics of the model and our methods that update the behavior of the planner. The goal is to create a unified framework where the robot, during the course of its execution, intelligently switches between (a) learning the true dynamics, (b) learning a model-free value estimate, or (c) biasing the planner away from an inaccurately modeled transition to guarantee task completeness while reducing the amount of real-world experience required. Additionally, we also want to explore this unified framework in the episodic setting, where the robot has access to resets, and in settings where the dynamics are nondeterministic.
Thesis Committee Members:
Maxim Likhachev, Co-chair
Drew Bagnell, Co-chair
Oliver Kroemer
Leslie Kaelbling, Massachusetts Institute of Technology