独立学习您在星际争霸多代理挑战中所需的一切吗？

论文标题

独立学习您在星际争霸多代理挑战中所需的一切吗？

Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge?

论文作者

de Witt, Christian Schroeder, Gupta, Tarun, Makoviichuk, Denys, Makoviychuk, Viktor, Torr, Philip H. S., Sun, Mingfei, Whiteson, Shimon

论文摘要

最近开发的方法在\ emph {集中培训和分散执行}设置中进行了合作多代理增强学习的方法涉及估计集中式的共同价值函数。在本文中，我们证明，尽管具有各种理论缺点，即独立PPO（IPPO），这是一种独立学习的形式，在该形式中，每个代理人都可以简单地估算其本地价值函数，但在流行的多代理基准套件上的局部关节学习方法也可以表现出色，或者更好，或者更好，或者更好。我们还将IPPO与几种变体进行了比较。结果表明，IPPO的出色表现可能是由于其对某些形式的环境非平稳性的稳健性。

Most recently developed approaches to cooperative multi-agent reinforcement learning in the \emph{centralized training with decentralized execution} setting involve estimating a centralized, joint value function. In this paper, we demonstrate that, despite its various theoretical shortcomings, Independent PPO (IPPO), a form of independent learning in which each agent simply estimates its local value function, can perform just as well as or better than state-of-the-art joint learning approaches on popular multi-agent benchmark suite SMAC with little hyperparameter tuning. We also compare IPPO to several variants; the results suggest that IPPO's strong performance may be due to its robustness to some forms of environment non-stationarity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题