线性二次均值均值控制/连续时间的策略梯度的全球融合

论文标题

线性二次均值均值控制/连续时间的策略梯度的全球融合

Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field Control/Game in Continuous Time

论文作者

Wang, Weichen, Han, Jiequn, Yang, Zhuoran, Wang, Zhaoran

论文摘要

强化学习是通过与环境互动来了解多个代理的最佳政策的强大工具。随着代理数量的数量非常大，可以通过平均场问题近似系统。因此，它激发了平均场控制（MFC）和平均场游戏（MFG）的新研究方向。在本文中，我们研究了线性季度平均场控制和游戏的策略梯度方法，我们假设每个代理都具有相同的线性状态过渡和二次成本函数。尽管有关MFC和MFG策略梯度的最新作品都是基于离散时间模型，但我们专注于连续的时间模型，在这些模型中，某些分析技术对读者来说可能很有趣。对于MFC和MFG，我们提供了策略梯度更新，并表明它以线性速率收敛到最佳解决方案，该解决方案通过合成模拟验证。对于MFG，我们还为NASH平衡的存在和独特性提供了足够的条件。

Reinforcement learning is a powerful tool to learn the optimal policy of possibly multiple agents by interacting with the environment. As the number of agents grow to be very large, the system can be approximated by a mean-field problem. Therefore, it has motivated new research directions for mean-field control (MFC) and mean-field game (MFG). In this paper, we study the policy gradient method for the linear-quadratic mean-field control and game, where we assume each agent has identical linear state transitions and quadratic cost functions. While most of the recent works on policy gradient for MFC and MFG are based on discrete-time models, we focus on the continuous-time models where some analyzing techniques can be interesting to the readers. For both MFC and MFG, we provide policy gradient update and show that it converges to the optimal solution at a linear rate, which is verified by a synthetic simulation. For MFG, we also provide sufficient conditions for the existence and uniqueness of the Nash equilibrium.

下载PDF全文

下载文献需遵守相关版权规定

论文标题