关于竞争性多代理强化学习中信息不对称性：融合和最佳性

论文标题

关于竞争性多代理强化学习中信息不对称性：融合和最佳性

On Information Asymmetry in Competitive Multi-Agent Reinforcement Learning: Convergence and Optimality

论文作者

Tampubolon, Ezra, Ceribasic, Haris, Boche, Holger

论文摘要

在这项工作中，我们研究了与非合作性两种Q学习代理相互作用的系统，其中一种代理具有观察对方的行为的特权。我们表明，这些信息不对称会导致人口学习的稳定结果，这通常不会在一般独立学习者的环境中发生。在基本的游戏意义上，即产生的学习后政策几乎是最佳的，即它们形成纳什均衡。此外，我们在这项工作中提出了一种Q学习算法，需要对后来的两个对手的行动进行预测观察，从而产生了最佳策略，因为后者采用了固定策略，并讨论了基础信息不对称的游戏中NASH平衡的存在。

In this work, we study the system of interacting non-cooperative two Q-learning agents, where one agent has the privilege of observing the other's actions. We show that this information asymmetry can lead to a stable outcome of population learning, which generally does not occur in an environment of general independent learners. The resulting post-learning policies are almost optimal in the underlying game sense, i.e., they form a Nash equilibrium. Furthermore, we propose in this work a Q-learning algorithm, requiring predictive observation of two subsequent opponent's actions, yielding an optimal strategy given that the latter applies a stationary strategy, and discuss the existence of the Nash equilibrium in the underlying information asymmetrical game.

下载PDF全文

下载文献需遵守相关版权规定

论文标题