论文标题

改善低资源跨语言实体链接的候选人生成

Improving Candidate Generation for Low-resource Cross-lingual Entity Linking

论文作者

Zhou, Shuyan, Rijhwani, Shruti, Wieting, John, Carbonell, Jaime, Neubig, Graham

论文摘要

跨语性实体链接(XEL)是在目标知识库(KB)中查找从源语言文本提取的提及的指南的任务。 (x)EL的第一步是候选生成,它从目标语言KB中检索了一个合理的候选实体列表,每次提及。基于Wikipedia资源的方法已在相对较高的资源语言(HRL)的领域中获得成功,但是这些方法并不能很好地扩展到低资源语言(LRL),而Wikipedia页面很少(如果有的话)。最近,已证明转移学习方法可以通过使用密切相关的语言来减少LRL中资源的需求,但性能仍然远远落后于其高资源对应物。在本文中,我们首先评估当前实体候选生成方法的低资源XEL所面临的问题,然后提出了三个改进,(1)(1)减少实体提及和KB条目之间的断开连接,以及(2)改善模型的鲁棒性,对低资源场景。这些方法很简单,但有效:我们在七个XEL数据集上尝试了方法,发现它们的平均增益在前30名候选候选人召回中,与最先进的基线相比,它们的平均增益为16.9%。我们改进的模型还可以在端到端XEL的IN-KB准确度中平均增益7.9%。

Cross-lingual entity linking (XEL) is the task of finding referents in a target-language knowledge base (KB) for mentions extracted from source-language texts. The first step of (X)EL is candidate generation, which retrieves a list of plausible candidate entities from the target-language KB for each mention. Approaches based on resources from Wikipedia have proven successful in the realm of relatively high-resource languages (HRL), but these do not extend well to low-resource languages (LRL) with few, if any, Wikipedia pages. Recently, transfer learning methods have been shown to reduce the demand for resources in the LRL by utilizing resources in closely-related languages, but the performance still lags far behind their high-resource counterparts. In this paper, we first assess the problems faced by current entity candidate generation methods for low-resource XEL, then propose three improvements that (1) reduce the disconnect between entity mentions and KB entries, and (2) improve the robustness of the model to low-resource scenarios. The methods are simple, but effective: we experiment with our approach on seven XEL datasets and find that they yield an average gain of 16.9% in Top-30 gold candidate recall, compared to state-of-the-art baselines. Our improved model also yields an average gain of 7.9% in in-KB accuracy of end-to-end XEL.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源