论文标题
通过非可逆随机梯度Langevin动力学优化非凸优化
Non-Convex Optimization via Non-Reversible Stochastic Gradient Langevin Dynamics
论文作者
论文摘要
随机梯度Langevin Dynamics(SGLD)是一种功能强大的算法,用于优化非凸目标,其中将受控且正确缩放的高斯噪声添加到随机梯度中,以将迭代转向全球最小值。 SGLD基于延时可逆的Langevin扩散过度。通过将抗对称矩阵添加到过度阻尼Langevin扩散的漂移项中,人们获得了不可逆的扩散,该扩散会以更快的收敛速率收敛到相同的固定分布。在本文中,我们研究了非可逆的随机梯度Langevin Dynamics(NSGLD),该动力学基于非可逆Langevin扩散的离散化。我们为NSGLD的全局收敛提供有限的时间性能界限,以解决随机非凸优化问题。我们的结果导致人口和经验风险最小化问题提供了非反应保证。贝叶斯独立组件分析和神经网络模型的数值实验表明,NSGLD可以胜过抗对称矩阵的表现。
Stochastic Gradient Langevin Dynamics (SGLD) is a powerful algorithm for optimizing a non-convex objective, where a controlled and properly scaled Gaussian noise is added to the stochastic gradients to steer the iterates towards a global minimum. SGLD is based on the overdamped Langevin diffusion which is reversible in time. By adding an anti-symmetric matrix to the drift term of the overdamped Langevin diffusion, one gets a non-reversible diffusion that converges to the same stationary distribution with a faster convergence rate. In this paper, we study the non reversible Stochastic Gradient Langevin Dynamics (NSGLD) which is based on discretization of the non-reversible Langevin diffusion. We provide finite-time performance bounds for the global convergence of NSGLD for solving stochastic non-convex optimization problems. Our results lead to non-asymptotic guarantees for both population and empirical risk minimization problems. Numerical experiments for Bayesian independent component analysis and neural network models show that NSGLD can outperform SGLD with proper choices of the anti-symmetric matrix.