Simultaneous Policy and Discrete Communication Learning for Multi-Agent Cooperation
Abstract
Decentralized multi-agent reinforcement learning has been demonstrated to be an effective solution to large multiagent control problems. However, agents typically can only make decisions based on local information, resulting in suboptimal performance in partially-observable settings. The addition of a communication channel overcomes this limitation by allowing agents to exchange information. Existing approaches, however, have required agent output size to scale exponentially with the number of message bits, and have been slow to converge to satisfactory policies due to the added difficulty of learning message selection. We propose an independent bitwise message policy parameterization that allows agent output size to scale linearly with information content. Additionally, we leverage aspects of the environment structure to derive a novel policy gradient estimator that is both unbiased and has a lower variance message gradient contribution than typical policy gradient estimators. We evaluate the impact of these two contributions on a collaborative multi-agent robot navigation problem, in which information must be exchanged among agents. We find that both significantly improve sample efficiency and result in improved final policies, and demonstrate the applicability of these techniques by deploying the learned policies on physical robots.
BibTeX
@article{Freed-2020-122891,author = {Benjamin Freed and Guillaume Sartoretti and Howie Choset},
title = {Simultaneous Policy and Discrete Communication Learning for Multi-Agent Cooperation},
journal = {IEEE Robotics and Automation Letters},
year = {2020},
month = {April},
volume = {5},
number = {2},
pages = {2498 - 2505},
}