论文标题
通过主要组件聚类方法的社区检测
Community Detection by Principal Components Clustering Methods
论文作者
论文摘要
基于经典程度校正的随机块模型(DCSBM)用于网络社区检测问题的模型,我们提出了两种新颖的方法:主成分聚类(PCC)和标准化的主成分聚类(NPCC)。如果没有任何估计的参数,则可以简单地实现PCC方法。在温和的条件下,我们表明PCC会产生一致的社区检测。 NPCC的设计基于PCC和RSC方法的组合(Qin&Rohe 2013)。 NPCC的人群分析表明,NPCC在DCSBM下为理想情况返回完美的聚类。 PCC和NPCC通过合成和现实世界数据集说明。数值结果表明,NPCC提供了与PCC和RSC的显着改进。此外,NPCC继承了PCC和RSC的NICE属性,因此NPCC对要聚类的特征向量的数量和选择参数的选择不敏感。在处理两个弱信号网络SIMMONS和CALTECH时,通过考虑更多用于聚类的特征向量,我们分别提供了PCC和NPCC的两个改进PCC+和NPCC+。与原始算法相比,两种改进算法都提供了改进的性能。特别是,NPCC+在Simmons和Caltech上提供令人满意的性能,错误率分别为121/1137和96/590。
Based on the classical Degree Corrected Stochastic Blockmodel (DCSBM) model for network community detection problem, we propose two novel approaches: principal component clustering (PCC) and normalized principal component clustering (NPCC). Without any parameters to be estimated, the PCC method is simple to be implemented. Under mild conditions, we show that PCC yields consistent community detection. NPCC is designed based on the combination of the PCC and the RSC method (Qin & Rohe 2013). Population analysis for NPCC shows that NPCC returns perfect clustering for the ideal case under DCSBM. PCC and NPCC is illustrated through synthetic and real-world datasets. Numerical results show that NPCC provides a significant improvement compare with PCC and RSC. Moreover, NPCC inherits nice properties of PCC and RSC such that NPCC is insensitive to the number of eigenvectors to be clustered and the choosing of the tuning parameter. When dealing with two weak signal networks Simmons and Caltech, by considering one more eigenvectors for clustering, we provide two refinements PCC+ and NPCC+ of PCC and NPCC, respectively. Both two refinements algorithms provide improvement performances compared with their original algorithms. Especially, NPCC+ provides satisfactory performances on Simmons and Caltech, with error rates of 121/1137 and 96/590, respectively.