基于核心的强大中心型问题的策略

论文标题

基于核心的强大中心型问题的策略

Coreset-based Strategies for Robust Center-type Problems

论文作者

Pietracaprina, Andrea, Pucci, Geppino, Soldà, Federico

论文摘要

给定一些公制空间中的数据集$ v $点，因此流行的$ k $中心问题需要确定$ k $点（中心）的子集中的$ v $ $ v $最小值$ v $的最大距离。问题的\ emph {robust}公式具有另一个参数$ z $，并且在计算与中心的最大距离时，最多可忽略$ v $（离群）的$ z $点。在本文中，我们专注于可靠的$ k $中心问题的两个重要的约束变化，即强大的矩形中心（RMC）问题，其中返回中心的集合被约束为在$ v $上构建的等级$ k $的独立集，而$ v $，以及每个元素$ $ i c $ i c $ i i是v $ i c in v y y i f w。返回的中心中最多必须是1。我们为这两个问题制定基于核心的策略，这些策略产生有效的顺序，mapReduce和流媒体算法。更具体地说，对于任何固定的$ε> 0 $，算法返回解决方案具有$（3+ε）$ - 近似值，这仅是一个添加术语$ε$，可以从最著名的三个问题上可以实现的3个标记符合来实现的三个问题。此外，该算法忽略了数据集的内在复杂性，该算法由其加倍尺寸$ d $捕获。对于参数的宽范围$ k，z，ε，d $，我们在$ | v | $中获得了一种带有运行时间线性的顺序算法，以及具有几回合/通过的MapReduce/streaming算法，并且实质上是sublinear local/working local/working memore。

Given a dataset $V$ of points from some metric space, the popular $k$-center problem requires to identify a subset of $k$ points (centers) in $V$ minimizing the maximum distance of any point of $V$ from its closest center. The \emph{robust} formulation of the problem features a further parameter $z$ and allows up to $z$ points of $V$ (outliers) to be disregarded when computing the maximum distance from the centers. In this paper, we focus on two important constrained variants of the robust $k$-center problem, namely, the Robust Matroid Center (RMC) problem, where the set of returned centers are constrained to be an independent set of a matroid of rank $k$ built on $V$, and the Robust Knapsack Center (RKC) problem, where each element $i\in V$ is given a positive weight $w_i<1$ and the aggregate weight of the returned centers must be at most 1. We devise coreset-based strategies for the two problems which yield efficient sequential, MapReduce, and Streaming algorithms. More specifically, for any fixed $ε>0$, the algorithms return solutions featuring a $(3+ε)$-approximation ratio, which is a mere additive term $ε$ away from the 3-approximations achievable by the best known polynomial-time sequential algorithms for the two problems. Moreover, the algorithms obliviously adapt to the intrinsic complexity of the dataset, captured by its doubling dimension $D$. For wide ranges of the parameters $k,z,ε, D$, we obtain a sequential algorithm with running time linear in $|V|$, and MapReduce/Streaming algorithms with few rounds/passes and substantially sublinear local/working memory.

下载PDF全文

下载文献需遵守相关版权规定

论文标题