论文标题

用于离散确定点过程的更快的采样器

A Faster Sampler for Discrete Determinantal Point Processes

论文作者

Barthelmé, Simon, Tremblay, Nicolas, Amblard, Pierre-Olivier

论文摘要

离散的确定点过程(DPP)具有子采样数据集的广泛潜在应用。但是,在某些情况下,它们因采样成本而被阻止。在最坏的情况下,采样成本量表为o(n^3),其中n是地面集的元素的数量。这一过高的成本的一种流行的解决方法是对低级内核定义的DPP进行采样。在这种情况下,标准采样算法的成本量表为O(np^2 + nm^2),其中m是DPP的样品(通常是M << n)的(通常M << n)的(平均)数量,p是定义DPP(M \ leq p \ leq n)的内核等级。第一个学期o(np^2)来自类似SVD的步骤。我们将重点放在此费用的第二个任期上,o(nm^2),并证明它可以将其降低到O(NM + M^3 log M)而不会损失采样的精确性。在实践中,与经典算法相比,我们在n> 1000时观察到非常实质性的加速。此处描述的算法是用于采样连续DPP的标准算法的紧密变体,并使用拒绝采样。在投影DPP的特定情况下,我们还表明可以在时间O(M^3 log M)中绘制任何其他样本。最后,分析的一个有趣的副产品是,从DPP实现的实现通常包含在使用杠杆分数I.I.D.形成的大小O(M log M)的子集中。采样。

Discrete Determinantal Point Processes (DPPs) have a wide array of potential applications for subsampling datasets. They are however held back in some cases by the high cost of sampling. In the worst-case scenario, the sampling cost scales as O(n^3) where n is the number of elements of the ground set. A popular workaround to this prohibitive cost is to sample DPPs defined by low-rank kernels. In such cases, the cost of standard sampling algorithms scales as O(np^2 + nm^2) where m is the (average) number of samples of the DPP (usually m << n) and p the rank of the kernel used to define the DPP (m \leq p \leq n). The first term, O(np^2), comes from a SVD-like step. We focus here on the second term of this cost, O(nm^2), and show that it can be brought down to O(nm + m^3 log m) without loss on the sampling's exactness. In practice, we observe very substantial speedups compared to the classical algorithm as soon as n > 1000. The algorithm described here is a close variant of the standard algorithm for sampling continuous DPPs, and uses rejection sampling. In the specific case of projection DPPs, we also show that any additional sample can be drawn in time O(m^3 log m). Finally, an interesting by-product of the analysis is that a realisation from a DPP is typically contained in a subset of size O(m log m) formed using leverage score i.i.d. sampling.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源