论文标题

分散的自然政策梯度,降低了协作多机构增强学习的差异

Decentralized Natural Policy Gradient with Variance Reduction for Collaborative Multi-Agent Reinforcement Learning

论文作者

Chen, Jinchi, Feng, Jie, Gao, Weiguo, Wei, Ke

论文摘要

本文研究了一个政策优化问题,该问题是由协作多代理强化学习在分散的环境中引起的,在该环境中,代理商通过无方向的图表与邻居进行交流,以最大程度地提高其累积奖励的总和。提出了一种新型的分散自然政策梯度方法,称为基于动量的分散自然政策梯度(MDNPG),该方法拟议中,它结合了自然梯度,基于动量的差异,并梯度跟踪到分散的随机梯度上升梯度上升框架中。 $ \ MATHCAL {O}(n^{ - 1}ε^{ - 3})$样本的MDNPG样本复杂性收敛到$ε$ - 稳定点是在标准假设下建立的,其中$ n $是代理的数量。它表明MDNPG可以实现分散策略梯度方法的最佳收敛速率,并且与集中式优化方法相比,具有线性加速。此外,通过广泛的数值实验证明了MDNPG超过其他最先进算法的优质经验性能。

This paper studies a policy optimization problem arising from collaborative multi-agent reinforcement learning in a decentralized setting where agents communicate with their neighbors over an undirected graph to maximize the sum of their cumulative rewards. A novel decentralized natural policy gradient method, dubbed Momentum-based Decentralized Natural Policy Gradient (MDNPG), is proposed, which incorporates natural gradient, momentum-based variance reduction, and gradient tracking into the decentralized stochastic gradient ascent framework. The $\mathcal{O}(n^{-1}ε^{-3})$ sample complexity for MDNPG to converge to an $ε$-stationary point has been established under standard assumptions, where $n$ is the number of agents. It indicates that MDNPG can achieve the optimal convergence rate for decentralized policy gradient methods and possesses a linear speedup in contrast to centralized optimization methods. Moreover, superior empirical performance of MDNPG over other state-of-the-art algorithms has been demonstrated by extensive numerical experiments.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源