通过宪报改善神经命名实体识别

论文标题

通过宪报改善神经命名实体识别

Improving Neural Named Entity Recognition with Gazetteers

论文作者

Song, Chan Hee, Lawrie, Dawn, Finin, Tim, Mayfield, James

论文摘要

这项工作的目的是通过添加指示单词是宪报中名称的一部分的输入功能来提高神经名称实体识别系统的性能。本文介绍了如何从Wikidata知识图中生成Gazetteer，以及如何将信息集成到神经系统中。实验表明，该方法以两种不同的语言获得了性能：高资源，基于单词的语言，英语和高资源，基于角色的语言，中文。还以低资源语言进行了实验，俄语是在带有四种核心类型和十二种扩展类型的Reddit的新俄罗斯NER语料库上进行的。本文报告了基线得分。它是第33扇弗莱尔会议中的纸的更长版本（Song等，2020）。

The goal of this work is to improve the performance of a neural named entity recognition system by adding input features that indicate a word is part of a name included in a gazetteer. This article describes how to generate gazetteers from the Wikidata knowledge graph as well as how to integrate the information into a neural NER system. Experiments reveal that the approach yields performance gains in two distinct languages: a high-resource, word-based language, English and a high-resource, character-based language, Chinese. Experiments were also performed in a low-resource language, Russian on a newly annotated Russian NER corpus from Reddit tagged with four core types and twelve extended types. This article reports a baseline score. It is a longer version of a paper in the 33rd FLAIRS conference (Song et al. 2020).

下载PDF全文

下载文献需遵守相关版权规定

论文标题