变分运输：用于分布优化的收敛基于粒子的算法

论文标题

变分运输：用于分布优化的收敛基于粒子的算法

Variational Transport: A Convergent Particle-BasedAlgorithm for Distributional Optimization

论文作者

Yang, Zhuoran, Zhang, Yufeng, Chen, Yongxin, Wang, Zhaoran

论文摘要

我们考虑最小化对概率分布家族的功能定义的优化问题，该功能被认为具有变异形式。这种分布优化问题在机器学习和统计数据中广泛引起，蒙特卡罗取样，变异推理，策略优化和生成对抗网络作为示例。对于这个问题，我们提出了一种新型的基于粒子的算法，称为变异转运，该算法大约通过迭代推动一组颗粒而在概率分布的多种分布上执行Wasserstein梯度下降。具体而言，我们证明，相对于二阶瓦斯汀距离沿功能梯度的方向沿着测量方向移动等同于将推动图映射应用于概率分布，这可以通过推动一组颗粒来准确地近似。具体而言，在每次变异传输的迭代中，我们首先使用粒子的粒子解决与客观功能相关的变分问题，粒子的溶液产生了Wasserstein梯度方向。然后，我们通过将每个粒子沿该解决方案指定的方向推动来更新电流分布。通过表征在估计Wasserstein梯度和优化算法的进度时所产生的统计误差，我们证明，当目标函数满足Polyak-olojasiewicz（PL）（Polyak，1963年）的功能函数时，并且平稳性条件和平滑性条件，将各种统计型号汇总到某些统计上，以使某些统计误差符合某些数字，以使某些符号符合某些数字，以确保Z符合Z的数字，以确定Z符合Z符合的范围，以确定Z e Z符合范围的范围。颗粒变为无穷大。

We consider the optimization problem of minimizing a functional defined over a family of probability distributions, where the objective functional is assumed to possess a variational form. Such a distributional optimization problem arises widely in machine learning and statistics, with Monte-Carlo sampling, variational inference, policy optimization, and generative adversarial network as examples. For this problem, we propose a novel particle-based algorithm, dubbed as variational transport, which approximately performs Wasserstein gradient descent over the manifold of probability distributions via iteratively pushing a set of particles. Specifically, we prove that moving along the geodesic in the direction of functional gradient with respect to the second-order Wasserstein distance is equivalent to applying a pushforward mapping to a probability distribution, which can be approximated accurately by pushing a set of particles. Specifically, in each iteration of variational transport, we first solve the variational problem associated with the objective functional using the particles, whose solution yields the Wasserstein gradient direction. Then we update the current distribution by pushing each particle along the direction specified by such a solution. By characterizing both the statistical error incurred in estimating the Wasserstein gradient and the progress of the optimization algorithm, we prove that when the objective function satisfies a functional version of the Polyak-Łojasiewicz (PL) (Polyak, 1963) and smoothness conditions, variational transport converges linearly to the global minimum of the objective functional up to a certain statistical error, which decays to zero sublinearly as the number of particles goes to infinity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题