Multi-Agent Planning - Partial Reward Decoupling for Efficient Large-Scale Multi-Agent Learning - Robotics Institute Carnegie Mellon University
Multi-Agent Planning – Partial Reward Decoupling for Efficient Large-Scale Multi-Agent Learning
Project Head: Howie Choset

One of the preeminent obstacles to scaling multi-agent reinforcement learning to large numbers of agents  is assigning credit to individual agents’ actions.  This project seeks to address this credit assignment problem with an approach that we call partial reward decoupling (PRD). PRD attempts to decompose large cooperative multi-agent RL problems into decoupled subproblems involving subsets of agents, thereby simplifying credit assignment.  Our initial work has empirically demonstrate that decomposing the RL problem using PRD in both actor-critic and proximal policy optimization algorithms results in lower variance policy gradient estimates, which improves data efficiency, learning stability, and asymptotic performance across a wide array of multi-agent RL tasks, compared related approaches that do not use PRD, such as counterfactual multi-agent policy gradient (COMA), a state-of-the-art MARL algorithm.

[1] Freed, Benjamin, et al. “Learning Cooperative Multi-Agent Policies With Partial Reward Decoupling.” IEEE Robotics and Automation Letters 7.2 (2021): 890-897. https://ieeexplore.ieee.org/abstract/document/9653841?casa_token=938Xzf30YigAAAAA:vWmsqEbPmBwpxf-Z1WSp5RVodX6N37ye3OPPqMvBpvSuZ9c6rRTiYV7_l00RlriAhv72itZfMw

Displaying 1 Publications

current head