增强文档级别命名实体识别的标签一致性

论文标题

增强文档级别命名实体识别的标签一致性

Enhancing Label Consistency on Document-level Named Entity Recognition

论文作者

Jeong, Minbyul, Kang, Jaewoo

论文摘要

指定的实体识别（NER）是从生物医学应用中提取信息的基本组成部分。 NER的一个显着优势是它在文档上下提取生物医学实体方面的一致性。尽管现有的文档NER模型显示出一致的预测，但它们仍然不符合我们的期望。我们研究了实体内的形容词和介词是否会导致标签一致性低，从而导致预测不一致。在本文中，我们介绍了我们的方法Conner，从而增强了修饰符（例如形容词和介词）的标签依赖性，以实现更高的标签一致性。康纳（Conner）完善了修饰符的草案标签，以改善生物医学实体的输出表示。在四个流行的生物医学NER数据集上证明了我们方法的有效性。特别是，在两个数据集中证明了其功效，F1分数的绝对提高了7.5-8.6％。我们解释说，我们的Conner方法在具有本质上较低标签一致性的数据集上有效。在定性分析中，我们演示了我们的方法如何使NER模型产生一致的预测。我们的代码和资源可在https://github.com/dmis-lab/conner/上找到。

Named entity recognition (NER) is a fundamental part of extracting information from documents in biomedical applications. A notable advantage of NER is its consistency in extracting biomedical entities in a document context. Although existing document NER models show consistent predictions, they still do not meet our expectations. We investigated whether the adjectives and prepositions within an entity cause a low label consistency, which results in inconsistent predictions. In this paper, we present our method, ConNER, which enhances the label dependency of modifiers (e.g., adjectives and prepositions) to achieve higher label agreement. ConNER refines the draft labels of the modifiers to improve the output representations of biomedical entities. The effectiveness of our method is demonstrated on four popular biomedical NER datasets; in particular, its efficacy is proved on two datasets with 7.5-8.6% absolute improvements in the F1 score. We interpret that our ConNER method is effective on datasets that have intrinsically low label consistency. In the qualitative analysis, we demonstrate how our approach makes the NER model generate consistent predictions. Our code and resources are available at https://github.com/dmis-lab/ConNER/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题