On Combining Reinforcement Learning & Adversarial Training
Abstract
Reinforcement Learning (RL) allows us to train an agent to excel at a given sequential decision-making task by optimizing for a reward signal. Adversarial training involves a joint optimization scheme where an agent and an adversary compete against each other. In this work, we explore some domains involving the combination of RL and adversarial training, yielding practical learning algorithms. Certain domains use adversarial training as a tool to help improve the RL agent's performance whereas others have an adversary built into the problem statement. We explore both these kinds of scenarios and propose new algorithms that outperform existing ones.
1) We formulate a new class of Actor Residual Critic (ARC) RL algorithms for improving Adversarial Imitation Learning (AIL). Unlike most RL settings, the reward in AIL is differentiable, and to leverage the exact gradient of the reward, we propose ARC instead of the standard Actor-Critic algorithms in RL. ARC aided AIL algorithms outperform existing AIL algorithms in continuous-control tasks.
2) We create a new multi-agent mixed cooperative-competitive simulation environment called FortAttack that addresses several limitations in existing multi-agent environments. We further show that complex multi-agent strategies, that involve coordination and heterogeneous task allocation, can naturally emerge from scratch through competition between two teams of agents.
3) Leader-follower strategy is popular in multi-robot navigation. If a leader-follower multi-robot team is in a crucial mission and there is an external adversarial observer (enemy), hiding the leader's identity is crucial. The external adversary who wishes to sabotage the robot team's mission can simply identify and harm just the leader. This will compromise the whole robot team's mission. We propose a defense mechanism of hiding the leader's identity by ensuring the leader moves in a way that behaviorally camouflages it with the followers, making it difficult for an adversary to identify the leader. We further show that our multi-robot policies trained purely in simulation generalize to real human observers, making it difficult for them to identify the leader.
BibTeX
@mastersthesis{Deka-2021-129106,author = {Ankur Deka},
title = {On Combining Reinforcement Learning & Adversarial Training},
year = {2021},
month = {July},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-21-32},
keywords = {Reinforcement Learning, Adversarial Training, Multi-Agent Reinforcement Learning, Imitation Learning, Leader-follower navigation},
}