Wikipedia类别的自然语言推断的采矿知识

论文标题

Wikipedia类别的自然语言推断的采矿知识

Mining Knowledge for Natural Language Inference from Wikipedia Categories

论文作者

Chen, Mingda, Chu, Zewei, Stratos, Karl, Gimpel, Kevin

论文摘要

准确的词汇构成（LE）和自然语言推断（NLI）通常需要大量昂贵的注释。为了减轻对标有数据的需求，我们介绍了Wikinli：改善NLI和LE任务模型性能的资源。它包含由Wikipedia自然注释的类别层次结构构建的428,899对短语。我们表明，我们可以通过在Wikinli上预处理并在下游任务上转移模型来改善强大的基线，例如Bert和Roberta。我们与从WordNet和Wikidata等其他知识库中提取的短语进行系统的比较，以发现在Wikinli上进行预处理提供了最佳性能。此外，我们以其他语言构建Wikinli，并证明对它们进行预处理可以提高对相应语言的NLI任务的性能。

Accurate lexical entailment (LE) and natural language inference (NLI) often require large quantities of costly annotations. To alleviate the need for labeled data, we introduce WikiNLI: a resource for improving model performance on NLI and LE tasks. It contains 428,899 pairs of phrases constructed from naturally annotated category hierarchies in Wikipedia. We show that we can improve strong baselines such as BERT and RoBERTa by pretraining them on WikiNLI and transferring the models on downstream tasks. We conduct systematic comparisons with phrases extracted from other knowledge bases such as WordNet and Wikidata to find that pretraining on WikiNLI gives the best performance. In addition, we construct WikiNLI in other languages, and show that pretraining on them improves performance on NLI tasks of corresponding languages.

下载PDF全文

下载文献需遵守相关版权规定

论文标题