Title: Distributional Distance Classifiers for Goal-Conditioned Reinforcement Learning
Abstract:
Autonomous systems are increasingly being deployed in stochastic real-world environments. Often, these agents are trying to find the shortest path to a commanded goal. But what does it mean to find the shortest path in stochastic environments, where every strategy has a non-zero probability of failing? At the core of this question is a conflict between two seemingly-natural notions of planning: maximizing the probability of reaching a goal state, and minimizing the expected number of steps to reach that goal state. Reinforcement learning (RL) methods based on minimizing the steps to a goal make an implicit assumption: that the goal is always reached, at least within some finite horizon. This assumption is violated in practical settings and can lead to very suboptimal strategies.
In this work, we bridge the gap between these two notions of planning by estimating the probability of reaching the goal at different future timesteps. This is not the same as estimating the distance to the goal — rather, probabilities convey uncertainty in ever reaching the goal at all. We then propose a practical RL algorithm, Distributional NCE, for estimating these probabilities. Our value function will resemble that used in distributional RL, but will be used to solve (reward-free) goal-reaching tasks rather than (single) reward-maximization tasks. Not only does Distributional NCE outperform state-of-the-art contrastive RL algorithms on standard goal-reaching tasks, but it can also be used to estimate the distribution of dynamical distances to the goal. Taken together, we believe that our results provide a cogent framework for thinking about probabilities and distances in stochastic settings, along with a practical and effective algorithm for goal-conditioned RL.
Committee:
Prof. Jeff Schneider (advisor)
Prof. David Held
Homanga Bharadhwaj
Passcode: 352046