POMDP and Hierarchical Options MDP with Continuous Actions for Autonomous Driving at Intersections
Abstract
When applying autonomous driving technology to real-world scenarios, environmental uncertainties make the development of decision-making algorithms difficult. Modeling the problem as a Partially Observable Markov Decision Process (POMDP) [1] allows the algorithm to consider these uncertainties in the decision process, which makes it more robust to real sensor characteristics. However, solving the POMDP with reinforcement learning (RL) [2] often requires storing a large number of observations. Furthermore, for continuous action spaces, the system is computationally inefficient. This paper addresses these problems by proposing to model the problem as an MDP and learn a policy with RL using hierarchical options (HOMDP). The suggested algorithm can store the state-action pairs and only uses current observations to solve a POMDP problem. We compare the results of to the time-to-collision method [3] and the proposed POMDP-with-LSTM method. Our results show that the HOMDP approach is able to improve the performance of the agent for a four-way intersection task with two-way stop signs. The HOMDP method can generate both higher-level discrete options and lower-level continuous actions with only the observations of the current step.
BibTeX
@conference{Qiao-2018-113466,author = {Zhiqian Qiao and Katharina Muelling and John Dolan and Praveen Palanisamy and Priyantha Mudalige},
title = {POMDP and Hierarchical Options MDP with Continuous Actions for Autonomous Driving at Intersections},
booktitle = {Proceedings of IEEE Intelligent Transportation Systems Conference (ITSC '18)},
year = {2018},
month = {November},
pages = {2377 - 2382},
keywords = {autonomous driving, deep learning, reinforcement learning, curriculum learning, intersections, POMDP},
}