学习激励其他学习者

论文标题

学习激励其他学习者

Learning to Incentivize Other Learning Agents

论文作者

Yang, Jiachen, Li, Ang, Farajtabar, Mehrdad, Sunehag, Peter, Hughes, Edward, Zha, Hongyuan

论文摘要

近年来，开发强大而普遍的增强学习（RL）代理的挑战已受到越来越多的关注。这项工作的大部分都集中在单一代理设置上，在该设置中，代理可以最大化预定义的外部奖励功能。但是，一个长期的问题不可避免地出现：当这样的独立代理商在共享的多代理环境中不断学习和行动时，他们将如何合作？观察人类经常提供激励措施来影响他人的行为，我们建议在多机构环境中为每个RL代理配备每个RL代理，并能够使用学习的激励功能直接向其他代理提供奖励。每个代理商通过明确考虑其对接受者学习的影响，并通过对其自身外部目标的影响来了解其自己的激励功能。我们在实验中证明，这些代理在挑战总和马尔可夫游戏中的表现显着超过了标准RL和对手形成的代理，通常是通过找到近乎最佳的劳动力分裂。我们的工作表明，沿着这条道路的更多机会和挑战，以确保在多代理的未来确保共同利益。

The challenge of developing powerful and general Reinforcement Learning (RL) agents has received increasing attention in recent years. Much of this effort has focused on the single-agent setting, in which an agent maximizes a predefined extrinsic reward function. However, a long-term question inevitably arises: how will such independent agents cooperate when they are continually learning and acting in a shared multi-agent environment? Observing that humans often provide incentives to influence others' behavior, we propose to equip each RL agent in a multi-agent environment with the ability to give rewards directly to other agents, using a learned incentive function. Each agent learns its own incentive function by explicitly accounting for its impact on the learning of recipients and, through them, the impact on its own extrinsic objective. We demonstrate in experiments that such agents significantly outperform standard RL and opponent-shaping agents in challenging general-sum Markov games, often by finding a near-optimal division of labor. Our work points toward more opportunities and challenges along the path to ensure the common good in a multi-agent future.

下载PDF全文

下载文献需遵守相关版权规定

论文标题