改善低资源跨语言实体链接的候选人生成

论文标题

改善低资源跨语言实体链接的候选人生成

Improving Candidate Generation for Low-resource Cross-lingual Entity Linking

论文作者

Zhou, Shuyan, Rijhwani, Shruti, Wieting, John, Carbonell, Jaime, Neubig, Graham

论文摘要

跨语性实体链接（XEL）是在目标知识库（KB）中查找从源语言文本提取的提及的指南的任务。（x）EL的第一步是候选生成，它从目标语言KB中检索了一个合理的候选实体列表，每次提及。基于Wikipedia资源的方法已在相对较高的资源语言（HRL）的领域中获得成功，但是这些方法并不能很好地扩展到低资源语言（LRL），而Wikipedia页面很少（如果有的话）。最近，已证明转移学习方法可以通过使用密切相关的语言来减少LRL中资源的需求，但性能仍然远远落后于其高资源对应物。在本文中，我们首先评估当前实体候选生成方法的低资源XEL所面临的问题，然后提出了三个改进，（1）（1）减少实体提及和KB条目之间的断开连接，以及（2）改善模型的鲁棒性，对低资源场景。这些方法很简单，但有效：我们在七个XEL数据集上尝试了方法，发现它们的平均增益在前30名候选候选人召回中，与最先进的基线相比，它们的平均增益为16.9％。我们改进的模型还可以在端到端XEL的IN-KB准确度中平均增益7.9％。

Cross-lingual entity linking (XEL) is the task of finding referents in a target-language knowledge base (KB) for mentions extracted from source-language texts. The first step of (X)EL is candidate generation, which retrieves a list of plausible candidate entities from the target-language KB for each mention. Approaches based on resources from Wikipedia have proven successful in the realm of relatively high-resource languages (HRL), but these do not extend well to low-resource languages (LRL) with few, if any, Wikipedia pages. Recently, transfer learning methods have been shown to reduce the demand for resources in the LRL by utilizing resources in closely-related languages, but the performance still lags far behind their high-resource counterparts. In this paper, we first assess the problems faced by current entity candidate generation methods for low-resource XEL, then propose three improvements that (1) reduce the disconnect between entity mentions and KB entries, and (2) improve the robustness of the model to low-resource scenarios. The methods are simple, but effective: we experiment with our approach on seven XEL datasets and find that they yield an average gain of 16.9% in Top-30 gold candidate recall, compared to state-of-the-art baselines. Our improved model also yields an average gain of 7.9% in in-KB accuracy of end-to-end XEL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题