论文标题
基于步行的社区钥匙会在大图上搜索
Random Walk-based Community Key-members Search over Large Graphs
论文作者
论文摘要
给定图形$ g $,查询节点$ q $和整数$ k $,社区搜索(CS)从包含$ q $的$ g $中寻求一个有凝聚力的子图(由$ k $ -core或$ k $ -truss等社区模型测量)。对于图形的复杂性知识较少的普通用户很难设置合适的$ k $。即使我们定义了相当大的$ k $,CS返回的社区规模通常太大,无法获得有关它的深刻见解。与整个社区相比,社区中的主要成员似乎比其他社区更有价值。为此,我们专注于社区密钥会员搜索问题(CKS)。我们将观点转向社区中包含$ Q $的关键会员,而不是整个社区。为了解决CKS问题,我们首先提出了一种基于桁架分解作为基线的精确算法。然后,我们通过仔细考虑过渡矩阵设计中的三个重要凝聚力特征,提出四种基于步行的优化算法,以实现有效性和效率之间的权衡。结果,当随机步行收敛时,我们会根据固定分布返回密钥会员。我们从理论上分析了设计粘性感知性转变矩阵的合理性,以随机行走,通过基于高斯混合模型的贝叶斯理论,具有盒子-Cox转换和copula函数拟合。此外,我们提出了一种按照``扩展替代''的方式提出一种轻巧的精炼方法,以进一步优化结果,并扩展了具有多个查询节点的CK的方法。对各种现实世界数据集的全面实验研究证明了我们方法的优势。
Given a graph $G$, a query node $q$, and an integer $k$, community search (CS) seeks a cohesive subgraph (measured by community models such as $k$-core or $k$-truss) from $G$ that contains $q$. It is difficult for ordinary users with less knowledge of graphs' complexity to set an appropriate $k$. Even if we define quite a large $k$, the community size returned by CS is often too large for users to gain much insight about it. Compared against the entire community, key-members in the community appear more valuable than others. To contend with this, we focus on Community Key-members Search problem (CKS). We turn our perspective to the key-members in the community containing $q$ instead of the entire community. To solve CKS problem, we first propose an exact algorithm based on truss decomposition as a baseline. Then, we present four random walk-based optimized algorithms to achieve a trade-off between effectiveness and efficiency, by carefully considering three important cohesiveness features in the design of transition matrix. As a result, we return key-members according to the stationary distribution when random walk converges. We theoretically analyze the rationality of designing the cohesiveness-aware transition matrix for random walk, through Bayesian theory based on Gaussian Mixture Model with Box-Cox Transformation and Copula Function Fitting. Moreover, we propose a lightweight refinement method following an ``expand-replace" manner to further optimize the result with little overhead, and we extend our method for CKS with multiple query nodes. Comprehensive experimental studies on various real-world datasets demonstrate our method's superiority.