论文标题
一般均值游戏可证明的虚拟游戏
Provable Fictitious Play for General Mean-Field Games
论文作者
论文摘要
我们为固定平均场游戏提出了一种增强学习算法,其目标是学习构成NASH平衡的一对平均场状态和固定政策。当将平均场状态和政策视为两个参与者时,我们提出了一种虚拟的播放算法,替代地通过渐变偏差和近端策略优化更新平均场状态和策略。我们的算法与以前的文献形成鲜明对比,该文献解决了由迭代的平均场状态诱导的每个单人加强学习问题至最佳。此外,我们证明我们虚拟的播放算法以sublinear速率收敛到纳什均衡。据我们所知,这似乎是第一个基于平均场状态和策略的迭代更新,这是平均场游戏的第一个可融合的单循环增强学习算法。
We propose a reinforcement learning algorithm for stationary mean-field games, where the goal is to learn a pair of mean-field state and stationary policy that constitutes the Nash equilibrium. When viewing the mean-field state and the policy as two players, we propose a fictitious play algorithm which alternatively updates the mean-field state and the policy via gradient-descent and proximal policy optimization, respectively. Our algorithm is in stark contrast with previous literature which solves each single-agent reinforcement learning problem induced by the iterates mean-field states to the optimum. Furthermore, we prove that our fictitious play algorithm converges to the Nash equilibrium at a sublinear rate. To the best of our knowledge, this seems the first provably convergent single-loop reinforcement learning algorithm for mean-field games based on iterative updates of both mean-field state and policy.