Policy Decomposition: A Framework for Discovery of Control Hierarchies for Efficient Policy Optimization with Suboptimality Estimates - Robotics Institute Carnegie Mellon University

Policy Decomposition: A Framework for Discovery of Control Hierarchies for Efficient Policy Optimization with Suboptimality Estimates

Ashwin Khadke
PhD Thesis, Tech. Report, CMU-RI-TR-24-38, July, 2024

Abstract

Optimal Control is a popular formulation for designing controllers for dynamic robotic systems. Under the formulation, the desired long-term behavior of the system is encoded via a cost function and the policy, i.e. a mapping from the state of the system to control commands, to achieve the desired behavior is derived by solving an optimization problem. A fundamental challenge in scaling up policy optimization to complex systems is that the computational requirement scales exponentially with the dimensionality of the state-space. Owing to this curse of dimensionality simplifying hierarchies are employed to reduce the computational burden. Very often, these hierarchies are hand-designed based on intuitions about the system's dynamics, and do not account for their effect on the system's closed-loop behavior under the resulting policies. The systematic design of hierarchies to simplify controller synthesis is a critical and active area of research and is the focus of this work.

This thesis introduces Policy Decomposition, a framework that alleviates the curse of dimensionality by algorithmically reducing a complex policy optimization problem into a hierarchy of simpler subproblems that are much more tractable to solve. Two standout features of this framework are its ability to 1) automatically propose control hierarchies and 2) estimate a priori how the control performance under policies resulting from different hierarchies compares with the optimal policy. Additionally, we develop search methods based on Genetic Algorithm and Monte Carlo tree search to automatically discover promising hierarchies. Therefore, those that dramatically reduce the required computation in policy optimization while sacrificing minimally on control performance can be readily identified. The framework is agnostic to the choice of policy representations and optimization algorithms.

We demonstrate the generality of the Policy Decomposition framework by applying it towards finding hierarchies for several robotic systems, including the control of a simplified biped, and a quadcopter. Furthermore, we present results using Policy Iteration with look-up table based policy representations as well as more modern methods such as Proximal Policy Optimization with neural network policies. The discovered hierarchies either outperform heuristically constructed ones in closed-loop performance or provide dramatic reductions in required compute with marginally suboptimal control performance.

BibTeX

@phdthesis{Khadke-2024-141804,
author = {Ashwin Khadke},
title = {Policy Decomposition: A Framework for Discovery of Control Hierarchies for Efficient Policy Optimization with Suboptimality Estimates},
year = {2024},
month = {July},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-24-38},
keywords = {Optimal control and reinforcement learning, Hierarchical control, Evolutionary methods},
}