论文标题
了解图表学习中的负面抽样
Understanding Negative Sampling in Graph Representation Learning
论文作者
论文摘要
近年来,对图表示学习进行了广泛的研究。尽管具有为各种网络生成连续嵌入的潜力,但推断出对大节点语料库的高质量表示的有效性和效率仍然具有挑战性。抽样是实现绩效目标的关键点。先前的艺术通常专注于对阳性节点对进行采样,而负面抽样的策略则不足。为了弥合差距,我们从客观和风险的角度系统地分析了负抽样的作用,从理论上讲,负面采样与确定优化目标和结果方差同样重要。据我们所知,我们是第一个得出理论并量化负抽样分布的人,应与其阳性抽样分布相关。在理论的指导下,我们提出了MCN,以自我对比近似和大都会危机加速阴性采样来近似阳性分布。我们在5个数据集上评估了我们的方法,这些数据集涵盖了大量下游图形学习任务,包括链接预测,节点分类和个性化建议,共有19个实验设置。这些相对全面的实验结果证明了其稳健性和优越性。
Graph representation learning has been extensively studied in recent years. Despite its potential in generating continuous embeddings for various networks, both the effectiveness and efficiency to infer high-quality representations toward large corpus of nodes are still challenging. Sampling is a critical point to achieve the performance goals. Prior arts usually focus on sampling positive node pairs, while the strategy for negative sampling is left insufficiently explored. To bridge the gap, we systematically analyze the role of negative sampling from the perspectives of both objective and risk, theoretically demonstrating that negative sampling is as important as positive sampling in determining the optimization objective and the resulted variance. To the best of our knowledge, we are the first to derive the theory and quantify that the negative sampling distribution should be positively but sub-linearly correlated to their positive sampling distribution. With the guidance of the theory, we propose MCNS, approximating the positive distribution with self-contrast approximation and accelerating negative sampling by Metropolis-Hastings. We evaluate our method on 5 datasets that cover extensive downstream graph learning tasks, including link prediction, node classification and personalized recommendation, on a total of 19 experimental settings. These relatively comprehensive experimental results demonstrate its robustness and superiorities.