论文标题

快速,强大的比较异质空间中的概率度量

Fast and Robust Comparison of Probability Measures in Heterogeneous Spaces

论文作者

Sato, Ryoma, Cuturi, Marco, Yamada, Makoto, Kashima, Hisashi

论文摘要

比较在异质空间上支持的两个概率措施是机器学习中越来越重要的问题。当比较两个生物细胞种群时,就会出现此类问题,每个人都有自己的特征集,或者查看跨不同语言/语言训练的单词嵌入家庭时。对于这种设置,Gromov Wasserstein(GW)距离通常以黄金标准呈现。 GW是直观的,因为它量化了是否可以将一种度量映射到另一个度量。但是,其确切的计算是棘手的,声称近似值的大多数算法仍然昂贵。在\ cite {memoli-2011}上建立在\ cite {memoli-2011}上,他提议将每个分布中的每个点表示为其与所有其他点的距离的1D分布,我们在本文中介绍了锚式能量(AE)和锚定瓦斯汀(AWAW)距离,它们分别是能量和Wasterstein的距离。我们的主要贡献是提出一种扫描线算法,以在日志界限时间内计算Ae \ emph {emph {emph {priencation},在该时间,天真的实现将是立方体。这是准线性W.R.T.问题本身的描述。我们的第二个贡献是使用等级统计数据而不是原始距离的AE和AW的强大变体的提议。我们表明,AE和AW在各种实验环境中的表现良好,而流行GW近似值的计算成本的一小部分。代码可在\ url {https://github.com/joisino/anchor-energy}上找到。

Comparing two probability measures supported on heterogeneous spaces is an increasingly important problem in machine learning. Such problems arise when comparing for instance two populations of biological cells, each described with its own set of features, or when looking at families of word embeddings trained across different corpora/languages. For such settings, the Gromov Wasserstein (GW) distance is often presented as the gold standard. GW is intuitive, as it quantifies whether one measure can be isomorphically mapped to the other. However, its exact computation is intractable, and most algorithms that claim to approximate it remain expensive. Building on \cite{memoli-2011}, who proposed to represent each point in each distribution as the 1D distribution of its distances to all other points, we introduce in this paper the Anchor Energy (AE) and Anchor Wasserstein (AW) distances, which are respectively the energy and Wasserstein distances instantiated on such representations. Our main contribution is to propose a sweep line algorithm to compute AE \emph{exactly} in log-quadratic time, where a naive implementation would be cubic. This is quasi-linear w.r.t. the description of the problem itself. Our second contribution is the proposal of robust variants of AE and AW that uses rank statistics rather than the original distances. We show that AE and AW perform well in various experimental settings at a fraction of the computational cost of popular GW approximations. Code is available at \url{https://github.com/joisino/anchor-energy}.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源