论文标题

希尔伯特曲线投影距离进行比较

Hilbert Curve Projection Distance for Distribution Comparison

论文作者

Li, Tao, Meng, Cheng, Xu, Hongteng, Yu, Jun

论文摘要

分发比较在许多机器学习任务(例如数据分类和生成建模)中起着核心作用。在这项研究中,我们提出了一种称为Hilbert曲线投影(HCP)距离的新型度量,以测量具有低复杂性的两个概率分布之间的距离。特别是,我们首先使用希尔伯特曲线投射两个高维概率分布,以获得它们之间的耦合,然后根据耦合在原始空间中这两个分布之间的传输距离进行计算。我们表明,HCP距离是一个适当的度量标准,并且定义明确,以便在有界支持的情况下进行概率度量。此外,我们证明了$ d $二维空间中的经验经验HCP距离,其人口的成本不超过$ o(n^{ - 1/2 \ max \ max \ {d,p \}}})$。为了抑制差异性的诅咒,我们还使用(可学习的)子空间投影开发了HCP距离的两个变体。合成数据和现实世界数据的实验表明,我们的HCP距离是瓦斯坦斯坦距离的有效替代,其复杂性低并克服了切成薄片的Wasserstein距离的缺点。

Distribution comparison plays a central role in many machine learning tasks like data classification and generative modeling. In this study, we propose a novel metric, called Hilbert curve projection (HCP) distance, to measure the distance between two probability distributions with low complexity. In particular, we first project two high-dimensional probability distributions using Hilbert curve to obtain a coupling between them, and then calculate the transport distance between these two distributions in the original space, according to the coupling. We show that HCP distance is a proper metric and is well-defined for probability measures with bounded supports. Furthermore, we demonstrate that the modified empirical HCP distance with the $L_p$ cost in the $d$-dimensional space converges to its population counterpart at a rate of no more than $O(n^{-1/2\max\{d,p\}})$. To suppress the curse-of-dimensionality, we also develop two variants of the HCP distance using (learnable) subspace projections. Experiments on both synthetic and real-world data show that our HCP distance works as an effective surrogate of the Wasserstein distance with low complexity and overcomes the drawbacks of the sliced Wasserstein distance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源