论文标题

在半随机噪声下用查询聚类

Clustering with Queries under Semi-Random Noise

论文作者

Del Pia, Alberto, Ma, Mingchen, Tzamos, Christos

论文摘要

Mazumdar和Saha \ Cite {MS17A}的开创性论文引入了有关聚类的广泛工作,并带有嘈杂的查询。然而,尽管在问题上取得了重大进展,但所提出的方法至关重要地取决于了解基础全随机甲骨文错误的确切概率。在这项工作中,我们开发了可靠的学习方法,这些方法可以忍受一般的半随机噪声在定性上获得与全随机模型中最佳方法相同的保证。 更具体地说,给定一组带有未知基础分区的$ n $点,我们可以查询点$ u,v $检查它们是否处于同一集群中,但是使用概率$ p $,答案可能是对手的。我们在理论上显示信息$ o \ left(\ frac {nk \ log n} {(1-2p)^2} \ right)$查询足以学习任何足够大尺寸的群集。我们的主要结果是一种计算上有效的算法,可以用$ o \ left(\ frac {nk \ log n} {(1-2p)^2} \右) + \ text {poly} \ left(\ log n,k,k,k,frac {1} {1-2p} {1-2p} $ questyments yies nies nime conterys yousy nies conterys yousy nies conterys ye ye ye ye ye ye ye ye ye questians全随机模型。作为我们方法的推论,我们为全随机模型开发了第一个无参数算法,并通过\ cite {ms17a}回答了一个开放的问题。

The seminal paper by Mazumdar and Saha \cite{MS17a} introduced an extensive line of work on clustering with noisy queries. Yet, despite significant progress on the problem, the proposed methods depend crucially on knowing the exact probabilities of errors of the underlying fully-random oracle. In this work, we develop robust learning methods that tolerate general semi-random noise obtaining qualitatively the same guarantees as the best possible methods in the fully-random model. More specifically, given a set of $n$ points with an unknown underlying partition, we are allowed to query pairs of points $u,v$ to check if they are in the same cluster, but with probability $p$, the answer may be adversarially chosen. We show that information theoretically $O\left(\frac{nk \log n} {(1-2p)^2}\right)$ queries suffice to learn any cluster of sufficiently large size. Our main result is a computationally efficient algorithm that can identify large clusters with $O\left(\frac{nk \log n} {(1-2p)^2}\right) + \text{poly}\left(\log n, k, \frac{1}{1-2p} \right)$ queries, matching the guarantees of the best known algorithms in the fully-random model. As a corollary of our approach, we develop the first parameter-free algorithm for the fully-random model, answering an open question by \cite{MS17a}.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源