Safe Deep Policy Adaptation

Wenli Xiao, Tairan He, John M. Dolan, and Guanya Shi

Conference Paper, Proceedings of (ICRA) International Conference on Robotics and Automation, pp. 17286-17292, May, 2024

View Publication

Abstract

A critical goal of autonomy and artificial intelligence is enabling autonomous robots to rapidly adapt in dynamic and uncertain environments. Classic adaptive control and safe control provide stability and safety guarantees but are limited to specific system classes. In contrast, policy adaptation based on reinforcement learning (RL) offers versatility and generalizability but presents safety and robustness challenges. We propose SafeDPA, a novel RL and control framework that simultaneously tackles the problems of policy adaptation and safe reinforcement learning. SafeDPA jointly learns adaptive policy and dynamics models in simulation, predicts environment configurations, and fine-tunes dynamics models with few-shot real-world data. A safety filter based on the
Control Barrier Function (CBF) on top of the RL policy is introduced to ensure safety during real-world deployment. We provide theoretical safety guarantees of SafeDPA and show the robustness of SafeDPA against learning errors and extra perturbations. Comprehensive experiments on (1) classic control problems (Inverted Pendulum), (2) simulation benchmarks (Safety Gym), and (3) a real-world agile robotics platform (RC Car) demonstrate great superiority of SafeDPA in both safety and task performance, over state-of-the-art baselines. Particularly, SafeDPA demonstrates notable generalizability, achieving a 300% increase in safety rate compared to the baselines, under unseen disturbances in real-world experiments.

BibTeX

@conference{Xiao-2024-143683,
author = {Wenli Xiao and Tairan He and John M. Dolan and Guanya Shi},
title = {Safe Deep Policy Adaptation},
booktitle = {Proceedings of (ICRA) International Conference on Robotics and Automation},
year = {2024},
month = {May},
pages = {17286-17292},
keywords = {adaptive control, safe control, policy adaptation, reinforcement learning, control barrier functions},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.