Abstract:
Reinforcement Learning (RL) presents great promises for autonomous agents. However, when using robots in a safety critical domain, a system has to be robust enough to be deployed in real life. For example, the robot should be able to perform across different scenarios it will encounter. The robot should avoid entering undesirable and irreversible states, such as crashing into obstacles, and ideally should meet the safety consideration even if its primary goals cannot be achieved.
One way to improve an RL agent’s robustness is to explore a variety of scenarios via domain randomization. However, as an agent’s performance gets better, it becomes less likely to explore the region the agent performs poorly. An approach to solve this problem is Adversarial training, where an adversarial agent tries to inject noise to make the ego agent perform poorly. However, in such a setup, it’s much easier for the adversary to win over the ego agent, and hence ego agent often fails to overcome the adversarial noise without expert supervision. Also, as we move robots into more unstructured environments, environmental factors can affect the distribution of state space and dynamics more than what can be encoded as noise.
In my thesis, I propose working on curriculum learning and online adaptation to prepare an agent against a variety of situations it may encounter. In the completed work, I’ll discuss how curriculum learning with a genetic algorithm provides us with a new framework to efficiently explore the scenario space and help an agent to learn a more generalizable policy against different starting states, dynamics, opponent policies, and environmental changes. In the proposed work, I’ll discuss how I can incorporate online adaptation to allow an agent to adapt to a wider variety of seen and unseen scenarios, as well as how curricular learning can be used to train an algorithm to be more adaptive.
Thesis Committee Members:
Jeff Schneider, Chair
David Held
Zachary Manchester
Jeff Clune, University of British Columbia