Leveraging Affordances for Accelerating Online Reinforcement Learning - Robotics Institute Carnegie Mellon University

Leveraging Affordances for Accelerating Online Reinforcement Learning

Master's Thesis, Tech. Report, CMU-RI-TR-24-45, July, 2024

Abstract

A long-standing problem in online reinforcement learning (RL) is that of ensuring sample efficiency. This stems from the inability of a learning agent to explore environments efficiently. Most attempts at efficient exploration tackle this problem in a setting where no prior information is utilized to bootstrap learning, thus failing to leverage additional affordances that may be cheaply available at the time of learning. These could include expert demonstrations, simulators that can reset to arbitrary states and domain specific inductive biases. Such affordances are valuable resources that offer enormous potential to guide exploration and speed up learning. As such their use in facilitating accelerated exploration is under explored in existing literature.

Consequently, in this thesis, we study different ways to utilize such affordances to facilitate faster online learning. We first incorporate affordances into the output end of a policy by describing a method that re-parameterizes the action space of driving policies through Bezier curves to induce an inductive bias towards natural vehicular trajectories. This enables us to learn challenging driving behaviour over 12× faster than traditional instantaneous action spaces. Subsequently, we study how affordances influencing the input end of a policy can improve learning efficiency. When provided with an arbitrarily resettable simulator, we find that training with a suitable choice of an auxiliary start state distribution that may differ from the true start state distribution of the underlying Markov Decision Process can significantly improve sample efficiency. Using a notion of safety to inform the choice of this auxiliary distribution significantly accelerates learning. We empirical demonstrate the effectiveness of this approach on the suite of MuJoCo continuous control tasks and a hard-exploration sparse-reward navigation task.

BibTeX

@mastersthesis{Mehra-2024-142215,
author = {Aman Mehra},
title = {Leveraging Affordances for Accelerating Online Reinforcement Learning},
year = {2024},
month = {July},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-24-45},
keywords = {Reinforcement Learning, Sample-efficiency, Online RL},
}