通过多代理增强学习，在隐藏角色随机游戏中分类模棱两可的身份

论文标题

通过多代理增强学习，在隐藏角色随机游戏中分类模棱两可的身份

Classifying Ambiguous Identities in Hidden-Role Stochastic Games with Multi-Agent Reinforcement Learning

论文作者

Han, Shijie, Li, Siyuan, An, Bo, Zhao, Wei, Liu, Peng

论文摘要

多代理增强学习（MARL）是解决随机游戏的普遍学习范式。在大多数MAL研究中，游戏中的代理商事先将其定义为队友或敌人，并且在整个游戏中，代理之间的关系保持固定。但是，在现实世界中，代理关系通常是未知的，或者是动态变化的。许多多方互动开始问：我的团队中谁在？这个问题是出现在证券交易所还是幼儿园的第一天。因此，面对不完善的信息和模棱两可的身份，针对这种情况的培训政策是需要解决的重要问题。在这项工作中，我们开发了一种新颖的身份检测强化学习（IDRL）框架，该框架使代理商可以动态地推断附近代理的身份并选择适当的策略来完成任务。在IDRL框架中，建立了一个关系网络，以通过观察代理的行为来推断其他代理的身份。优化了危险网络，以估计虚假阳性识别的风险。除此之外，我们提出了一个内在的奖励，可以平衡最大化外部奖励和准确识别的需求。在确定了代理商之间的合作竞争模式之后，IDRL应用了现成的MARL方法之一来学习政策。为了评估所提出的方法，我们对RED-10纸牌脱落游戏进行实验，结果表明，IDRL比其他最先进的MARL方法实现了卓越的性能。令人印象深刻的是，关系网络具有标准绩效，可以识别与顶级人类参与者的代理商的身份。危险网络合理地避免了出现不完善的风险。复制所有报告结果的代码可在https://github.com/mr-benjie/idrl上在线获得。

Multi-agent reinforcement learning (MARL) is a prevalent learning paradigm for solving stochastic games. In most MARL studies, agents in a game are defined as teammates or enemies beforehand, and the relationships among the agents remain fixed throughout the game. However, in real-world problems, the agent relationships are commonly unknown in advance or dynamically changing. Many multi-party interactions start off by asking: who is on my team? This question arises whether it is the first day at the stock exchange or the kindergarten. Therefore, training policies for such situations in the face of imperfect information and ambiguous identities is an important problem that needs to be addressed. In this work, we develop a novel identity detection reinforcement learning (IDRL) framework that allows an agent to dynamically infer the identities of nearby agents and select an appropriate policy to accomplish the task. In the IDRL framework, a relation network is constructed to deduce the identities of other agents by observing the behaviors of the agents. A danger network is optimized to estimate the risk of false-positive identifications. Beyond that, we propose an intrinsic reward that balances the need to maximize external rewards and accurate identification. After identifying the cooperation-competition pattern among the agents, IDRL applies one of the off-the-shelf MARL methods to learn the policy. To evaluate the proposed method, we conduct experiments on Red-10 card-shedding game, and the results show that IDRL achieves superior performance over other state-of-the-art MARL methods. Impressively, the relation network has the par performance to identify the identities of agents with top human players; the danger network reasonably avoids the risk of imperfect identification. The code to reproduce all the reported results is available online at https://github.com/MR-BENjie/IDRL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题