论文标题

基于核心的强大中心型问题的策略

Coreset-based Strategies for Robust Center-type Problems

论文作者

Pietracaprina, Andrea, Pucci, Geppino, Soldà, Federico

论文摘要

给定一些公制空间中的数据集$ v $点,因此流行的$ k $中心问题需要确定$ k $点(中心)的子集中的$ v $ $ v $最小值$ v $的最大距离。问题的\ emph {robust}公式具有另一个参数$ z $,并且在计算与中心的最大距离时,最多可忽略$ v $(离群)的$ z $点。在本文中,我们专注于可靠的$ k $中心问题的两个重要的约束变化,即强大的矩形中心(RMC)问题,其中返回中心的集合被约束为在$ v $上构建的等级$ k $的独立集,而$ v $,以及每个元素$ $ i c $ i c $ i i是v $ i c in v y y i f w。返回的中心中最多必须是1。我们为这两个问题制定基于核心的策略,这些策略产生有效的顺序,mapReduce和流媒体算法。更具体地说,对于任何固定的$ε> 0 $,算法返回解决方案具有$(3+ε)$ - 近似值,这仅是一个添加术语$ε$,可以从最著名的三个问题上可以实现的3个标记符合来实现的三个问题。此外,该算法忽略了数据集的内在复杂性,该算法由其加倍尺寸$ d $捕获。对于参数的宽范围$ k,z,ε,d $,我们在$ | v | $中获得了一种带有运行时间线性的顺序算法,以及具有几回合/通过的MapReduce/streaming算法,并且实质上是sublinear local/working local/working memore。

Given a dataset $V$ of points from some metric space, the popular $k$-center problem requires to identify a subset of $k$ points (centers) in $V$ minimizing the maximum distance of any point of $V$ from its closest center. The \emph{robust} formulation of the problem features a further parameter $z$ and allows up to $z$ points of $V$ (outliers) to be disregarded when computing the maximum distance from the centers. In this paper, we focus on two important constrained variants of the robust $k$-center problem, namely, the Robust Matroid Center (RMC) problem, where the set of returned centers are constrained to be an independent set of a matroid of rank $k$ built on $V$, and the Robust Knapsack Center (RKC) problem, where each element $i\in V$ is given a positive weight $w_i<1$ and the aggregate weight of the returned centers must be at most 1. We devise coreset-based strategies for the two problems which yield efficient sequential, MapReduce, and Streaming algorithms. More specifically, for any fixed $ε>0$, the algorithms return solutions featuring a $(3+ε)$-approximation ratio, which is a mere additive term $ε$ away from the 3-approximations achievable by the best known polynomial-time sequential algorithms for the two problems. Moreover, the algorithms obliviously adapt to the intrinsic complexity of the dataset, captured by its doubling dimension $D$. For wide ranges of the parameters $k,z,ε, D$, we obtain a sequential algorithm with running time linear in $|V|$, and MapReduce/Streaming algorithms with few rounds/passes and substantially sublinear local/working memory.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源