语言模型是开放的知识图

论文标题

语言模型是开放的知识图

Language Models are Open Knowledge Graphs

论文作者

Wang, Chenguang, Liu, Xiao, Song, Dawn

论文摘要

本文展示了如何在没有人类监督的情况下从预训练的语言模型（例如Bert，GPT-2/3）中构造知识图（KGS）。流行的公斤（例如Wikidata，Nell）以监督或半监督的方式建造，要求人类创造知识。最近的深度语言模型会自动通过预训练从大规模语料库中获取知识。存储的知识使语言模型能够改善下游NLP任务，例如回答问题以及编写代码和文章。在本文中，我们提出了一种无监督的方法，将语言模型中包含的知识投入到kgs中。我们表明，KG是由对语料库进行预训练的语言模型的单个前向通行证（没有微调）的。我们通过与人类创建的两个公斤（Wikidata，TAC KBP）相比，证明了建造的公斤的质量。我们的KG还提供了现有KGS中新知识的开放事实知识。我们的代码和公斤将公开可用。

This paper shows how to construct knowledge graphs (KGs) from pre-trained language models (e.g., BERT, GPT-2/3), without human supervision. Popular KGs (e.g, Wikidata, NELL) are built in either a supervised or semi-supervised manner, requiring humans to create knowledge. Recent deep language models automatically acquire knowledge from large-scale corpora via pre-training. The stored knowledge has enabled the language models to improve downstream NLP tasks, e.g., answering questions, and writing code and articles. In this paper, we propose an unsupervised method to cast the knowledge contained within language models into KGs. We show that KGs are constructed with a single forward pass of the pre-trained language models (without fine-tuning) over the corpora. We demonstrate the quality of the constructed KGs by comparing to two KGs (Wikidata, TAC KBP) created by humans. Our KGs also provide open factual knowledge that is new in the existing KGs. Our code and KGs will be made publicly available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题