POMDP and Hierarchical Options MDP with Continuous Actions for Autonomous Driving at Intersections

Zhiqian Qiao, Katharina Muelling, John Dolan, Praveen Palanisamy, and Priyantha Mudalige

Conference Paper, Proceedings of IEEE Intelligent Transportation Systems Conference (ITSC '18), pp. 2377 - 2382, November, 2018

View Publication

Abstract

When applying autonomous driving technology to real-world scenarios, environmental uncertainties make the development of decision-making algorithms difficult. Modeling the problem as a Partially Observable Markov Decision Process (POMDP) [1] allows the algorithm to consider these uncertainties in the decision process, which makes it more robust to real sensor characteristics. However, solving the POMDP with reinforcement learning (RL) [2] often requires storing a large number of observations. Furthermore, for continuous action spaces, the system is computationally inefficient. This paper addresses these problems by proposing to model the problem as an MDP and learn a policy with RL using hierarchical options (HOMDP). The suggested algorithm can store the state-action pairs and only uses current observations to solve a POMDP problem. We compare the results of to the time-to-collision method [3] and the proposed POMDP-with-LSTM method. Our results show that the HOMDP approach is able to improve the performance of the agent for a four-way intersection task with two-way stop signs. The HOMDP method can generate both higher-level discrete options and lower-level continuous actions with only the observations of the current step.

BibTeX

@conference{Qiao-2018-113466,
author = {Zhiqian Qiao and Katharina Muelling and John Dolan and Praveen Palanisamy and Priyantha Mudalige},
title = {POMDP and Hierarchical Options MDP with Continuous Actions for Autonomous Driving at Intersections},
booktitle = {Proceedings of IEEE Intelligent Transportation Systems Conference (ITSC '18)},
year = {2018},
month = {November},
pages = {2377 - 2382},
keywords = {autonomous driving, deep learning, reinforcement learning, curriculum learning, intersections, POMDP},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.