论文标题

DSSLP:半监督链接预测的分布式框架

DSSLP: A Distributed Framework for Semi-supervised Link Prediction

论文作者

Zhang, Dalong, Song, Xianzheng, Liu, Ziqi, Zhang, Zhiqiang, Huang, Xin, Wang, Lin, Zhou, Jun

论文摘要

链接预测广泛用于各种工业应用中,例如商人建议,欺诈交易检测等。但是,在具有数十亿个节点和边缘的工业规模图上培训和部署链接预测模型是一个巨大的挑战。在这项工作中,我们为半监督的链接预测问题(名为DSSLP)提供了一个可扩展的分布式框架,该框架能够处理工业规模的图表。提议DSSLP在整个图表上进行训练模型,而是提议在迷你批次设置中训练节点的\ emph {$ k $ -hops neighbounch},这有助于减少输入图的规模并分发训练过程。为了有效地生成负面示例,DSSLP包含一个分布式批处理的运行时采样模块。它实现了统一和动态的抽样方法,并能够自适应地构建正面和负面的例子以指导训练过程。此外,DSSLP提出了一种模型分割策略,以加速链接预测任务的推理速度。实验结果表明,DSSLP在服务公共数据集以及工业规模图的现实数据集中的有效性和效率。

Link prediction is widely used in a variety of industrial applications, such as merchant recommendation, fraudulent transaction detection, and so on. However, it's a great challenge to train and deploy a link prediction model on industrial-scale graphs with billions of nodes and edges. In this work, we present a scalable and distributed framework for semi-supervised link prediction problem (named DSSLP), which is able to handle industrial-scale graphs. Instead of training model on the whole graph, DSSLP is proposed to train on the \emph{$k$-hops neighborhood} of nodes in a mini-batch setting, which helps reduce the scale of the input graph and distribute the training procedure. In order to generate negative examples effectively, DSSLP contains a distributed batched runtime sampling module. It implements uniform and dynamic sampling approaches, and is able to adaptively construct positive and negative examples to guide the training process. Moreover, DSSLP proposes a model-split strategy to accelerate the speed of inference process of the link prediction task. Experimental results demonstrate that the effectiveness and efficiency of DSSLP in serval public datasets as well as real-world datasets of industrial-scale graphs.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源