Bi-Directional Value Learning for Risk-Aware Planning Under Uncertainty
Abstract
Decision-making under uncertainty is a crucial ability for autonomous systems. In its most general form, this problem can be formulated as a partially observable Markov decision process (POMDP). The solution policy of a POMDP can be implicitly encoded as a value function. In partially observable settings, the value function is typically learned via forward simulation of the system evolution. Focusing on accurate and long-range risk assessment, we propose a novel method, where the value function is learned in different phases via a bi-directional search in belief space. A backward value learning process provides a long-range and risk-aware base policy. A forward value learning process ensures local optimality and updates the policy via forward simulations. We consider a class of scalable and continuous-space rover navigation problems to assess the safety, scalability, and optimality of the proposed algorithm. The results demonstrate the capabilities of the proposed algorithm in evaluating long-range risk/safety of the planner while addressing continuous problems with long planning horizons.
BibTeX
@article{Kim-2019-113686,author = {Sung Kyun Kim and Rohan Thakker and Ali-Akbar Agha-Mohammadi},
title = {Bi-Directional Value Learning for Risk-Aware Planning Under Uncertainty},
journal = {IEEE Robotics and Automation Letters},
year = {2019},
month = {July},
volume = {4},
number = {3},
pages = {2493 - 2500},
}