论文标题

Outier-brobust最佳运输

Outlier-Robust Optimal Transport

论文作者

Mukherjee, Debarghya, Guha, Aritra, Solomon, Justin, Sun, Yuekai, Yurochkin, Mikhail

论文摘要

最佳传输(OT)以取决于样品空间的几何形状的方式测量分布之间的距离。鉴于计算机OT的最新进展,OT距离被广泛用作机器学习中的损失功能。尽管损失功能的流行率和优势也可能对异常值极为敏感。实际上,单个被挑选的异常值可以任意增加标准的$ W_2 $距离。为了解决这个问题,我们提出了OT的异常表现。我们的表述是凸的,但乍一看范围都具有挑战性。我们的主要贡献是基于成本截断的\ emph {等效}公式,该公式易于将其纳入现代算法以用于计算。我们在模拟模型和真实数据上的离群检测任务中,在HUBER污染模型下的平均估计问题中证明了我们的配方的好处。

Optimal transport (OT) measures distances between distributions in a way that depends on the geometry of the sample space. In light of recent advances in computational OT, OT distances are widely used as loss functions in machine learning. Despite their prevalence and advantages, OT loss functions can be extremely sensitive to outliers. In fact, a single adversarially-picked outlier can increase the standard $W_2$-distance arbitrarily. To address this issue, we propose an outlier-robust formulation of OT. Our formulation is convex but challenging to scale at a first glance. Our main contribution is deriving an \emph{equivalent} formulation based on cost truncation that is easy to incorporate into modern algorithms for computational OT. We demonstrate the benefits of our formulation in mean estimation problems under the Huber contamination model in simulations and outlier detection tasks on real data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源