Policy Decomposition - Robotics Institute Carnegie Mellon University
Loading Events

PhD Thesis Defense

June

3
Mon
Ashwin Rajendra Khadke PhD Student Robotics Institute,
Carnegie Mellon University
Monday, June 3
11:00 am to 12:30 pm
NSH 4305
Policy Decomposition

Abstract:
Optimal Control is a popular formulation for designing controllers for dynamic robotic systems. Under the formulation, the desired long-term behavior of the system is encoded via a cost function and the policy, i.e. a mapping from the state of the system to control commands, to achieve the desired behavior are obtained by solving an optimization problem. A fundamental challenge in scaling up policy optimization to complex systems is that the computational requirement scales exponentially with the dimensionality of the state-space. Owing to this curse of dimensionality simplifying hierarchies are employed to reduce the computational burden. Very often, these hierarchies are hand-designed based on intuitions about the system’s dynamics, and do not account for their effect on the system’s closed-loop behavior under the resulting policies. The systematic design of hierarchies to simplify controller synthesis is a critical and active area of research and is the focus of this work.

This thesis introduces Policy Decomposition, a framework that alleviates the curse of dimensionality by algorithmically reducing a complex policy optimization problem into a hierarchy of simpler subproblems that are much more tractable to solve. Two standout features of this framework are its ability to 1) automatically propose control hierarchies and 2) estimate a priori how the control performance under policies resulting from different hierarchies compares with the optimal policy. Additionally, we develop search methods based on Genetic Algorithm and Monte Carlo tree search to automatically discover promising hierarchies. Therefore, hierarchies that dramatically reduce the required computation in policy optimization while sacrificing minimally on control performance can be readily identified. The framework is agnostic to the choice of policy representations and optimization algorithms.

We demonstrate the generality of the Policy Decomposition framework by applying it towards finding hierarchies for several robotic systems, including the control of a simplified biped, and a quadcopter. Furthermore, we present results using Policy Iteration with look-up table based policy representations as well as more modern methods such as Proximal Policy Optimization with neural network policies. The discovered hierarchies either outperform heuristically constructed ones in closed-loop performance or provide dramatic reductions in required compute but marginally suboptimal control performance.

Thesis Committee:
Hartmut Geyer, Chair
Christopher Atkeson
Zachary Manchester
Nikolai Matni, University of Pennsylvania
Alex Gorodetsky, University of Michigan

More Information