Reasoning with Latent Diffusion in Offline Reinforcement Learning
Abstract
Offline reinforcement learning (RL) holds promise as a means to learn high-reward policies from a static dataset, without the need for further environment interactions. However, a key challenge in offline RL lies in effectively stitching portions of suboptimal trajectories from the static dataset while avoiding extrapolation errors arising due to a lack of support in the dataset. Existing approaches use conservative methods that are tricky to tune and struggle with multi-modal data (as we show) or rely on noisy Monte Carlo return-to-go samples for reward conditioning. In this work, we propose a novel approach that leverages the expressiveness of latent diffusion to model in-support trajectory sequences as compressed latent skills. This facilitates learning a Q-function while avoiding extrapolation error via batch constraining. The latent space is also expressive and gracefully copes with multimodal data. We show that the learned temporally-abstract latent space encodes richer task-specific information for offline RL tasks as compared to raw state actions. This improves credit assignment and facilitates faster reward propagation during Q-learning. Our method demonstrates state-of-the-art performance on the D4RL benchmarks, particularly excelling in long-horizon, sparse-reward tasks.
BibTeX
@conference{Vekatraman-2024-143610,author = {Siddharth Venkatraman and Shivesh Khaitan and Ravi Tej Akella and John M. Dolan and Jeff Schneider and Glen Berseth},
title = {Reasoning with Latent Diffusion in Offline Reinforcement Learning},
booktitle = {Proceedings of (ICLR) International Conference on Learning Representations},
year = {2024},
month = {May},
keywords = {offline reinforcement learning, diffusion, multimodal data},
}