跨域神经实体链接

论文标题

跨域神经实体链接

Cross-Domain Neural Entity Linking

论文作者

Soliman, Hassan

论文摘要

实体链接是将提及与给定知识库（KB）中的实体匹配的任务。它有助于注释网络上存在的大量文档，以利用有关其匹配实体的新事实。但是，现有的实体链接系统专注于开发通常依赖于域的模型，并且仅适用于已培训的特定知识基础。在对不同领域的文档和知识库进行评估时，表现不足。基于预先训练的语言模型的方法，例如Wu等。（2020年），尝试使用零拍设置来解决问题，说明在通用域KB上进行评估时一些潜力。然而，在特定于域的KB上进行评估时，性能并不等效。为了允许更准确的实体在不同领域链接，我们提出了我们的框架：跨域神经实体链接（CDNEN）。我们的目标是拥有一个单个系统，该系统可以同时链接到通用域KB和域特异性KB。 CDNEN通过从不同领域学习这些知识库的联合表示空间来工作。使用Logeswaran等人构建的外部实体链接数据集（Zeshel）进行评估。（2019年）和Botzer等人收集的REDDIT数据集。（2021），将我们提出的方法与最新结果进行比较。所提出的框架使用不同类型的数据集进行微调，从而导致CDNEN的不同模型变体。当对Zeshel数据集中包含的四个域进行评估时，这些变体的平均精度增益为9％。

Entity Linking is the task of matching a mention to an entity in a given knowledge base (KB). It contributes to annotating a massive amount of documents existing on the Web to harness new facts about their matched entities. However, existing Entity Linking systems focus on developing models that are typically domain-dependent and robust only to a particular knowledge base on which they have been trained. The performance is not as adequate when being evaluated on documents and knowledge bases from different domains. Approaches based on pre-trained language models, such as Wu et al. (2020), attempt to solve the problem using a zero-shot setup, illustrating some potential when evaluated on a general-domain KB. Nevertheless, the performance is not equivalent when evaluated on a domain-specific KB. To allow for more accurate Entity Linking across different domains, we propose our framework: Cross-Domain Neural Entity Linking (CDNEL). Our objective is to have a single system that enables simultaneous linking to both the general-domain KB and the domain-specific KB. CDNEL works by learning a joint representation space for these knowledge bases from different domains. It is evaluated using the external Entity Linking dataset (Zeshel) constructed by Logeswaran et al. (2019) and the Reddit dataset collected by Botzer et al. (2021), to compare our proposed method with the state-of-the-art results. The proposed framework uses different types of datasets for fine-tuning, resulting in different model variants of CDNEL. When evaluated on four domains included in the Zeshel dataset, these variants achieve an average precision gain of 9%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题