SIMKGC：具有预训练的语言模型的简单对比知识图完成

论文标题

SIMKGC：具有预训练的语言模型的简单对比知识图完成

SimKGC: Simple Contrastive Knowledge Graph Completion with Pre-trained Language Models

论文作者

Wang, Liang, Zhao, Wei, Wei, Zhuoyu, Liu, Jingming

论文摘要

知识图完成（KGC）旨在推理已知事实并推断丢失的链接。基于文本的方法，例如Kgbert（Yao等，2019），从自然语言描述中学习实体表示，并具有归纳性kgc的潜力。但是，基于文本的方法的性能仍然在很大程度上落后于基于图的基于图形的方法（例如Transe（Bordes等，2013）和旋转（Sun等，2019b）。在本文中，我们确定关键问题是有效的对比度学习。为了提高学习效率，我们介绍了三种类型的负面因素：内部负面因素，批处理前负和自我阴性，这些负面是一种简单的艰苦负面形式。与Infonce损失相结合，我们提出的模型Simkgc可以在几个基准数据集上大大优于基于嵌入的方法。就平均互惠等级（MRR）而言，我们将最先进的wn18RR +19％提高了19％，在Wikidata5m5m的the绕环境中 +6.8％，在Wikidata5m5m的感应环境中， +22％。进行了详尽的分析以了解每个组件。我们的代码可在https://github.com/intfloat/simkgc上找到。

Knowledge graph completion (KGC) aims to reason over known facts and infer the missing links. Text-based methods such as KGBERT (Yao et al., 2019) learn entity representations from natural language descriptions, and have the potential for inductive KGC. However, the performance of text-based methods still largely lag behind graph embedding-based methods like TransE (Bordes et al., 2013) and RotatE (Sun et al., 2019b). In this paper, we identify that the key issue is efficient contrastive learning. To improve the learning efficiency, we introduce three types of negatives: in-batch negatives, pre-batch negatives, and self-negatives which act as a simple form of hard negatives. Combined with InfoNCE loss, our proposed model SimKGC can substantially outperform embedding-based methods on several benchmark datasets. In terms of mean reciprocal rank (MRR), we advance the state-of-the-art by +19% on WN18RR, +6.8% on the Wikidata5M transductive setting, and +22% on the Wikidata5M inductive setting. Thorough analyses are conducted to gain insights into each component. Our code is available at https://github.com/intfloat/SimKGC .

下载PDF全文

下载文献需遵守相关版权规定

论文标题