论文标题
实体歧义与实体定义
Entity Disambiguation with Entity Definitions
论文作者
论文摘要
最近,本地模型在实体歧义(ED)方面取得了惊人的表现,生成和提取的制剂是最有前途的研究方向。但是,以前的工作将其研究限制为使用,作为每个候选人的文本表示,仅其Wikipedia标题。尽管肯定有效,但该策略提出了一些关键问题,尤其是当标题不足以提供信息或彼此区分时。在本文中,我们解决了这一限制,并调查了更具表现力的文本表示可以减轻它的程度。我们对ED中的标准基准进行了彻底评估我们的方法,并发现提取性配方特别适合这些表示:我们报告了我们考虑的6个基准中的2个基准中的2个,并强烈提高了与看不见的模式相比的概括能力。我们在https://github.com/sapienzanlp/extend上发布代码,数据和模型检查点。
Local models have recently attained astounding performances in Entity Disambiguation (ED), with generative and extractive formulations being the most promising research directions. However, previous works limited their studies to using, as the textual representation of each candidate, only its Wikipedia title. Although certainly effective, this strategy presents a few critical issues, especially when titles are not sufficiently informative or distinguishable from one another. In this paper, we address this limitation and investigate to what extent more expressive textual representations can mitigate it. We thoroughly evaluate our approach against standard benchmarks in ED and find extractive formulations to be particularly well-suited to these representations: we report a new state of the art on 2 out of 6 benchmarks we consider and strongly improve the generalization capability over unseen patterns. We release our code, data and model checkpoints at https://github.com/SapienzaNLP/extend.