与正交空间变化梯度下降的约束域中采样

论文标题

与正交空间变化梯度下降的约束域中采样

Sampling in Constrained Domains with Orthogonal-Space Variational Gradient Descent

论文作者

Zhang, Ruqi, Liu, Qiang, Tong, Xin T.

论文摘要

作为重要的推理和学习技术，采样方法通常是为无约束域而设计的。但是，在机器学习问题中，限制无处不在，例如在安全性，公平性，鲁棒性和许多其他属性上，必须满足的其他属性才能在现实生活中应用采样结果。执行这些限制通常会导致隐式定义的流形，从而使限制的有效抽样非常具有挑战性。在本文中，我们提出了一个新的变分框架，该框架具有设计的正交空间梯度流（O级别），用于在歧管$ \ MATHCAL {G} _0 _0 $上进行采样，由一般平等约束定义。 O级别将梯度分为两个部分：一个将距离降低到$ \ Mathcal {G} _0 $，而另一个则减少了正交空间中的KL差异。尽管大多数现有的歧管采样方法都需要在$ \ Mathcal {g} _0 $上进行初始化，但O级别不需要此类知识。我们证明，在轻度条件下，o级别的o级收敛于$ \ wideTilde {o}（1/\ text {the Itererations}）$的速率约束分布。我们的证明依赖于可能具有独立利益的条件度量的新的Stein表征。我们通过Langevin Dynamics和Stein变异梯度下降实施O级别，并证明了其在包括贝叶斯深神经网络在内的各种实验中的有效性。

Sampling methods, as important inference and learning techniques, are typically designed for unconstrained domains. However, constraints are ubiquitous in machine learning problems, such as those on safety, fairness, robustness, and many other properties that must be satisfied to apply sampling results in real-life applications. Enforcing these constraints often leads to implicitly-defined manifolds, making efficient sampling with constraints very challenging. In this paper, we propose a new variational framework with a designed orthogonal-space gradient flow (O-Gradient) for sampling on a manifold $\mathcal{G}_0$ defined by general equality constraints. O-Gradient decomposes the gradient into two parts: one decreases the distance to $\mathcal{G}_0$ and the other decreases the KL divergence in the orthogonal space. While most existing manifold sampling methods require initialization on $\mathcal{G}_0$, O-Gradient does not require such prior knowledge. We prove that O-Gradient converges to the target constrained distribution with rate $\widetilde{O}(1/\text{the number of iterations})$ under mild conditions. Our proof relies on a new Stein characterization of conditional measure which could be of independent interest. We implement O-Gradient through both Langevin dynamics and Stein variational gradient descent and demonstrate its effectiveness in various experiments, including Bayesian deep neural networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题