论文标题
SoftMax策略梯度的全球最优性,具有单个隐藏层神经网络在平均场状态下
Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime
论文作者
论文摘要
我们研究了无限 - 摩恩的策略优化问题,其折扣马尔可夫决策过程,具有软性策略和非线性函数近似,并通过策略梯度算法训练。当通过熵正则化鼓励探索时,我们集中于平均场态度中的训练动力学,例如,宽阔的单个隐藏层神经网络的行为。这些模型的动力学被确定为参数空间中分布的瓦斯汀梯度流。我们进一步证明了这种动态在轻度条件下的初始化时的固定点的全球最优性。
We study the problem of policy optimization for infinite-horizon discounted Markov Decision Processes with softmax policy and nonlinear function approximation trained with policy gradient algorithms. We concentrate on the training dynamics in the mean-field regime, modeling e.g., the behavior of wide single hidden layer neural networks, when exploration is encouraged through entropy regularization. The dynamics of these models is established as a Wasserstein gradient flow of distributions in parameter space. We further prove global optimality of the fixed points of this dynamics under mild conditions on their initialization.