MSR Thesis Talk: Ravi Tej Akella - Robotics Institute Carnegie Mellon University
Loading Events

MSR Thesis Defense

June

30
Fri
Ravi Tej Akella MSR Student Robotics Institute,
Carnegie Mellon University
Friday, June 30
9:00 am to 10:30 am
NSH 4305
MSR Thesis Talk: Ravi Tej Akella

Title: Distributional Distance Classifiers for Goal-Conditioned Reinforcement Learning

Abstract:
Autonomous systems are increasingly being deployed in stochastic real-world environments. Often, these agents are trying to find the shortest path to a commanded goal. But what does it mean to find the shortest path in stochastic environments, where every strategy has a non-zero probability of failing? At the core of this question is a conflict between two seemingly-natural notions of planning: maximizing the probability of reaching a goal state, and minimizing the expected number of steps to reach that goal state. Reinforcement learning (RL) methods based on minimizing the steps to a goal make an implicit assumption: that the goal is always reached, at least within some finite horizon. This assumption is violated in practical settings and can lead to very suboptimal strategies.

In this work, we bridge the gap between these two notions of planning by estimating the probability of reaching the goal at different future timesteps. This is not the same as estimating the distance to the goal — rather, probabilities convey uncertainty in ever reaching the goal at all. We then propose a practical RL algorithm, Distributional NCE, for estimating these probabilities. Our value function will resemble that used in distributional RL, but will be used to solve (reward-free) goal-reaching tasks rather than (single) reward-maximization tasks. Not only does Distributional NCE outperform state-of-the-art contrastive RL algorithms on standard goal-reaching tasks, but it can also be used to estimate the distribution of dynamical distances to the goal. Taken together, we believe that our results provide a cogent framework for thinking about probabilities and distances in stochastic settings, along with a practical and effective algorithm for goal-conditioned RL.

Committee:
Prof. Jeff Schneider (advisor)
Prof. David Held
Homanga Bharadhwaj

 

Meeting ID: 993 6326 3527
Passcode: 352046