Langevin扩散与歧管结构的快速收敛

论文标题

Langevin扩散与歧管结构的快速收敛

Fast Convergence for Langevin Diffusion with Manifold Structure

论文作者

Moitra, Ankur, Risteski, Andrej

论文摘要

在本文中，我们研究了一些函数f的p（x）\ propto e^{ - βf（x）}的分布的采样问题，我们可以查询其值和梯度。在出现此类问题的情况下，这种访问F的方式是自然的，例如从参数贝叶斯模型中的后代取样。经典的结果表明，当F为凸时，自然的随机行走，兰格文扩散会迅速混合。不幸的是，即使在简单的示例中，上面列出的应用程序也将需要使用非convex的函数F工作 - 从P中进行采样可能需要指数级的查询。在本文中，我们关注与现代机器学习应用相关的非概念性的方面：函数f中的不变性（对称性）的存在，因此分布p的分布将具有同等概率的点。首先，我们提供了一个配方，以证明Langevin扩散的混合时间边界是这些歧管几何形状的函数。其次，我们将论点专门用于经典的矩阵分解类似的贝叶斯推理问题，在这些问题中，我们获得了噪声的测量A（xx^t），x \ in R^{d \ times k}的低位矩阵，即f（x）= \ | a（x）= \ | a（xx^t） - b \ |^2_2，x \ rverys x，噪音。此类功能F在正交转换下是不变的，包括矩阵分解，传感，完成等问题。除了取样之外，Langevin Dynamics是一种流行的玩具模型，用于研究随机梯度下降。沿着这些线路，我们认为我们的工作是了解SGD在参数空间中具有高度对称性时的重要第一步。

In this paper, we study the problem of sampling from distributions of the form p(x) \propto e^{-βf(x)} for some function f whose values and gradients we can query. This mode of access to f is natural in the scenarios in which such problems arise, for instance sampling from posteriors in parametric Bayesian models. Classical results show that a natural random walk, Langevin diffusion, mixes rapidly when f is convex. Unfortunately, even in simple examples, the applications listed above will entail working with functions f that are nonconvex -- for which sampling from p may in general require an exponential number of queries. In this paper, we focus on an aspect of nonconvexity relevant for modern machine learning applications: existence of invariances (symmetries) in the function f, as a result of which the distribution p will have manifolds of points with equal probability. First, we give a recipe for proving mixing time bounds for Langevin diffusion as a function of the geometry of these manifolds. Second, we specialize our arguments to classic matrix factorization-like Bayesian inference problems where we get noisy measurements A(XX^T), X \in R^{d \times k} of a low-rank matrix, i.e. f(X) = \|A(XX^T) - b\|^2_2, X \in R^{d \times k}, and βthe inverse of the variance of the noise. Such functions f are invariant under orthogonal transformations, and include problems like matrix factorization, sensing, completion. Beyond sampling, Langevin dynamics is a popular toy model for studying stochastic gradient descent. Along these lines, we believe that our work is an important first step towards understanding how SGD behaves when there is a high degree of symmetry in the space of parameters the produce the same output.

下载PDF全文

下载文献需遵守相关版权规定

论文标题