论文标题
稀疏图的半监督聚类:跨越信息理论阈值
Semi-Supervised Clustering of Sparse Graphs: Crossing the Information-Theoretic Threshold
论文作者
论文摘要
随机块模型是用于网络结构数据集群和社区检测的规范随机图模型。几十年来,关于该问题的广泛研究已经建立了许多深刻的结果,其中从数学和应用的角度来看,Kesten-Stigum阈值的相变尤其有趣。它指出,如果模型参数低于一定阈值,则没有基于网络拓扑的估计器可以比在稀疏图上的机会更好。然而,如果我们将地平线稍微延伸到无处不在的半监督环境,那么这种基本限制将完全消失。我们证明,通过显示标签的任意部分,在整个参数域中,检测问题是可行的。此外,我们介绍了两种有效的算法,一种组合算法,一种基于优化的组合,以将标签信息与图形结构集成在一起。我们的工作为网络和半决赛计划研究的随机模型带来了新的观点。
The stochastic block model is a canonical random graph model for clustering and community detection on network-structured data. Decades of extensive study on the problem have established many profound results, among which the phase transition at the Kesten-Stigum threshold is particularly interesting both from a mathematical and an applied standpoint. It states that no estimator based on the network topology can perform substantially better than chance on sparse graphs if the model parameter is below a certain threshold. Nevertheless, if we slightly extend the horizon to the ubiquitous semi-supervised setting, such a fundamental limitation will disappear completely. We prove that with an arbitrary fraction of the labels revealed, the detection problem is feasible throughout the parameter domain. Moreover, we introduce two efficient algorithms, one combinatorial and one based on optimization, to integrate label information with graph structures. Our work brings a new perspective to the stochastic model of networks and semidefinite program research.