Abstract:
Reinforcement Learning (RL) presents great promises for autonomous agents. However, when using robots in a safety critical domain, a system has to be robust enough to be deployed in real life. For example, the robot should be able to perform across different scenarios it will encounter. The robot should avoid entering undesirable and irreversible states, such as crashing into obstacles and ideally should meet the safety consideration even if its primary goals cannot be achieved.
One way to improve an RL agent’s robustness is to explore a variety of scenarios, environment parameters and opponent policies, via domain randomization. However, as an agent’s performance gets better, it becomes less likely to explore the region the agent performs poorly. An approach to solve this problem is Adversarial training, where an adversarial agent tries to inject noise to make the ego agent perform poorly. However, in such a setup, it’s much easier for the adversary to win over the ego agent, and hence ego agent often fails to overcome the adversarial noise without expert supervision. Also, as we move robots into more unstructured environments, environmental factors can affect the distribution of state space and dynamics more than what can be encoded as noise.
In my thesis, I will discuss how we can use curriculum learning to help an agent explore against variety of different situations, opponents, and dynamics efficiently to help achieve a robust performance. The first part of the thesis will introduce ideas in curriculum learning and how it can be used to explore a wide range of environment. The second part will be expanding such concept to multiagent domain and see how curriculum learning can help find a robust policy against collaborative and competitive, symmetrical and asymmetrical setups. Finally, I will be expanding the findings to the Quality Diversity domain and explore how the curriculum learning can help us to find a library of behaviors that cumulatively achieves robustness.
Thesis Committee Members:
Jeff Schneider, Chair
David Held
Zachary Manchester
Jeff Clune, University of British Columbia