在有限的队友交流的多代理对抗游戏中有效培训

论文标题

在有限的队友交流的多代理对抗游戏中有效培训

Sample Efficient Training in Multi-Agent Adversarial Games with Limited Teammate Communication

论文作者

Meisheri, Hardik, Khadilkar, Harshad

论文摘要

我们描述了我们针对Pommerman TeamRadio的解决方案方法，这是与2019年Neurips相关的竞争环境。我们算法的定义特征是在击败前几年学习代理的同时，在限制性计算预算中实现了样本效率。拟议的算法（i）使用模仿学习来播种政策，（ii）明确定义了两个队友之间的通信协议，（iii）塑造了奖励，以在培训期间向每个代理商提供更丰富的反馈信号，并且（iv）使用掩盖进行灾难性的不良行动。我们描述了针对基准的广泛测试，包括2019年竞赛排行榜的测试，以及对学习政策以及每种修改对性能的影响的特定调查。我们表明，所提出的方法能够在半百万场比赛的训练中实现竞争性能，速度要比文献中的其他研究快得多。

We describe our solution approach for Pommerman TeamRadio, a competition environment associated with NeurIPS 2019. The defining feature of our algorithm is achieving sample efficiency within a restrictive computational budget while beating the previous years learning agents. The proposed algorithm (i) uses imitation learning to seed the policy, (ii) explicitly defines the communication protocol between the two teammates, (iii) shapes the reward to provide a richer feedback signal to each agent during training and (iv) uses masking for catastrophic bad actions. We describe extensive tests against baselines, including those from the 2019 competition leaderboard, and also a specific investigation of the learned policy and the effect of each modification on performance. We show that the proposed approach is able to achieve competitive performance within half a million games of training, significantly faster than other studies in the literature.

下载PDF全文

下载文献需遵守相关版权规定

论文标题