论文标题
关于竞争性多代理强化学习中信息不对称性:融合和最佳性
On Information Asymmetry in Competitive Multi-Agent Reinforcement Learning: Convergence and Optimality
论文作者
论文摘要
在这项工作中,我们研究了与非合作性两种Q学习代理相互作用的系统,其中一种代理具有观察对方的行为的特权。我们表明,这些信息不对称会导致人口学习的稳定结果,这通常不会在一般独立学习者的环境中发生。在基本的游戏意义上,即产生的学习后政策几乎是最佳的,即它们形成纳什均衡。此外,我们在这项工作中提出了一种Q学习算法,需要对后来的两个对手的行动进行预测观察,从而产生了最佳策略,因为后者采用了固定策略,并讨论了基础信息不对称的游戏中NASH平衡的存在。
In this work, we study the system of interacting non-cooperative two Q-learning agents, where one agent has the privilege of observing the other's actions. We show that this information asymmetry can lead to a stable outcome of population learning, which generally does not occur in an environment of general independent learners. The resulting post-learning policies are almost optimal in the underlying game sense, i.e., they form a Nash equilibrium. Furthermore, we propose in this work a Q-learning algorithm, requiring predictive observation of two subsequent opponent's actions, yielding an optimal strategy given that the latter applies a stationary strategy, and discuss the existence of the Nash equilibrium in the underlying information asymmetrical game.