论文标题
Langevin扩散与歧管结构的快速收敛
Fast Convergence for Langevin Diffusion with Manifold Structure
论文作者
论文摘要
在本文中,我们研究了一些函数f的p(x)\ propto e^{ - βf(x)}的分布的采样问题,我们可以查询其值和梯度。在出现此类问题的情况下,这种访问F的方式是自然的,例如从参数贝叶斯模型中的后代取样。经典的结果表明,当F为凸时,自然的随机行走,兰格文扩散会迅速混合。不幸的是,即使在简单的示例中,上面列出的应用程序也将需要使用非convex的函数F工作 - 从P中进行采样可能需要指数级的查询。 在本文中,我们关注与现代机器学习应用相关的非概念性的方面:函数f中的不变性(对称性)的存在,因此分布p的分布将具有同等概率的点。首先,我们提供了一个配方,以证明Langevin扩散的混合时间边界是这些歧管几何形状的函数。其次,我们将论点专门用于经典的矩阵分解类似的贝叶斯推理问题,在这些问题中,我们获得了噪声的测量A(xx^t),x \ in R^{d \ times k}的低位矩阵,即f(x)= \ | a(x)= \ | a(xx^t) - b \ |^2_2,x \ rverys x,噪音。此类功能F在正交转换下是不变的,包括矩阵分解,传感,完成等问题。除了取样之外,Langevin Dynamics是一种流行的玩具模型,用于研究随机梯度下降。沿着这些线路,我们认为我们的工作是了解SGD在参数空间中具有高度对称性时的重要第一步。
In this paper, we study the problem of sampling from distributions of the form p(x) \propto e^{-βf(x)} for some function f whose values and gradients we can query. This mode of access to f is natural in the scenarios in which such problems arise, for instance sampling from posteriors in parametric Bayesian models. Classical results show that a natural random walk, Langevin diffusion, mixes rapidly when f is convex. Unfortunately, even in simple examples, the applications listed above will entail working with functions f that are nonconvex -- for which sampling from p may in general require an exponential number of queries. In this paper, we focus on an aspect of nonconvexity relevant for modern machine learning applications: existence of invariances (symmetries) in the function f, as a result of which the distribution p will have manifolds of points with equal probability. First, we give a recipe for proving mixing time bounds for Langevin diffusion as a function of the geometry of these manifolds. Second, we specialize our arguments to classic matrix factorization-like Bayesian inference problems where we get noisy measurements A(XX^T), X \in R^{d \times k} of a low-rank matrix, i.e. f(X) = \|A(XX^T) - b\|^2_2, X \in R^{d \times k}, and βthe inverse of the variance of the noise. Such functions f are invariant under orthogonal transformations, and include problems like matrix factorization, sensing, completion. Beyond sampling, Langevin dynamics is a popular toy model for studying stochastic gradient descent. Along these lines, we believe that our work is an important first step towards understanding how SGD behaves when there is a high degree of symmetry in the space of parameters the produce the same output.