Equivalent Policy Sets for Learning Aligned Models and Abstractions - Robotics Institute Carnegie Mellon University
Loading Events

PhD Thesis Proposal

February

7
Tue
Benjamin (Ben) Freed PhD Student Robotics Institute,
Carnegie Mellon University
Tuesday, February 7
1:00 pm to 2:30 pm
GHC 4405
Equivalent Policy Sets for Learning Aligned Models and Abstractions

Abstract:

Recent successes in model-based reinforcement learning (MBRL) have demonstrated the enormous value that learned representations of environmental dynamics (i.e., models) can impart to autonomous decision making. While a learned model can never perfectly represent the dynamics of complex environments, models that are accurate in the “right” ways may still be highly useful for decision making. However, what constitutes the “right” notion of accuracy is still an open question. Previous research has shown that the modeling objectives typically used in MBRL are not aligned with the overall objective of policy improvement, leading MBRL approaches to perform poorly on certain tasks, especially those with irrelevant and distracting details. We propose three approaches to improve the utility of models for autonomous decision making. First, we introduce the notion of the equivalent policy set (EPS), which we define to be the set of policies that our model cannot prove are suboptimal. We see the EPS as a tool for studying the inherent limitations of MBRL, as it represents the extent to which a model-based approach can distinguish between optimal and suboptimal policies. Second, we use the EPS to propose a new RL paradigm, which we refer to as Unified RL (URL), that seeks to combine the advantages of model-free and model-based RL. In URL, the sole goal of model learning is to rule out suboptimal policies (i.e., those not within the EPS), allowing the agent to narrow the policy search space for a model-free RL algorithm. Finally, we aim to unify existing forms of abstraction learning, such as state, action, and game abstraction, by framing abstraction as part of the model-learning process. Here, abstractions are learned that minimize the size of the EPS, thus maximally reducing the policy search space.

Thesis Committee Members:
Howie Choset, Co-chair
Jeff Schneider, Co-chair
Ruslan Salakhutdinov
Roberto Calandra, Meta AI Research

More Information