基于步行的社区钥匙会在大图上搜索

论文标题

基于步行的社区钥匙会在大图上搜索

Random Walk-based Community Key-members Search over Large Graphs

论文作者

Wang, Yuxiang, Zhao, Yuyang, Xu, Xiaoliang, Wu, Yue, Wu, Tianxing, Ke, Xiangyu

论文摘要

给定图形$ g $，查询节点$ q $和整数$ k $，社区搜索（CS）从包含$ q $的$ g $中寻求一个有凝聚力的子图（由$ k $ -core或$ k $ -truss等社区模型测量）。对于图形的复杂性知识较少的普通用户很难设置合适的$ k $。即使我们定义了相当大的$ k $，CS返回的社区规模通常太大，无法获得有关它的深刻见解。与整个社区相比，社区中的主要成员似乎比其他社区更有价值。为此，我们专注于社区密钥会员搜索问题（CKS）。我们将观点转向社区中包含$ Q $的关键会员，而不是整个社区。为了解决CKS问题，我们首先提出了一种基于桁架分解作为基线的精确算法。然后，我们通过仔细考虑过渡矩阵设计中的三个重要凝聚力特征，提出四种基于步行的优化算法，以实现有效性和效率之间的权衡。结果，当随机步行收敛时，我们会根据固定分布返回密钥会员。我们从理论上分析了设计粘性感知性转变矩阵的合理性，以随机行走，通过基于高斯混合模型的贝叶斯理论，具有盒子-Cox转换和copula函数拟合。此外，我们提出了一种按照``扩展替代''的方式提出一种轻巧的精炼方法，以进一步优化结果，并扩展了具有多个查询节点的CK的方法。对各种现实世界数据集的全面实验研究证明了我们方法的优势。

Given a graph $G$, a query node $q$, and an integer $k$, community search (CS) seeks a cohesive subgraph (measured by community models such as $k$-core or $k$-truss) from $G$ that contains $q$. It is difficult for ordinary users with less knowledge of graphs' complexity to set an appropriate $k$. Even if we define quite a large $k$, the community size returned by CS is often too large for users to gain much insight about it. Compared against the entire community, key-members in the community appear more valuable than others. To contend with this, we focus on Community Key-members Search problem (CKS). We turn our perspective to the key-members in the community containing $q$ instead of the entire community. To solve CKS problem, we first propose an exact algorithm based on truss decomposition as a baseline. Then, we present four random walk-based optimized algorithms to achieve a trade-off between effectiveness and efficiency, by carefully considering three important cohesiveness features in the design of transition matrix. As a result, we return key-members according to the stationary distribution when random walk converges. We theoretically analyze the rationality of designing the cohesiveness-aware transition matrix for random walk, through Bayesian theory based on Gaussian Mixture Model with Box-Cox Transformation and Copula Function Fitting. Moreover, we propose a lightweight refinement method following an ``expand-replace" manner to further optimize the result with little overhead, and we extend our method for CKS with multiple query nodes. Comprehensive experimental studies on various real-world datasets demonstrate our method's superiority.

下载PDF全文

下载文献需遵守相关版权规定

论文标题