stackoverflow中的实体集共扩张

论文标题

stackoverflow中的实体集共扩张

Entity Set Co-Expansion in StackOverflow

论文作者

Zhang, Yu, Zhang, Yunyi, Jiang, Yucheng, Michalski, Martin, Deng, Yu, Popa, Lucian, Zhai, ChengXiang, Han, Jiawei

论文摘要

鉴于某种类型的几个种子实体（例如软件或编程语言），实体集扩展旨在发现一组与种子相同类型的实体集。实体集合与软件相关域（例如Stackoverflow）中的扩展可以使许多下游任务（例如软件知识图构建）受益，并促进更好的IT操作和服务管理。同时，现有方法不太关心两个问题：（1）如何同时处理多种类型的种子实体？（2）如何利用预训练的语言模型（PLM）的力量？在意识到这两个问题中，在本文中，我们研究了Stackoverflow中的实体集合expansion任务，该任务从Stackoverflow Question-Asswer线程中提取库，OS，应用程序和语言实体。在共同扩展过程中，我们使用PLM来得出候选实体的嵌入，以计算实体之间的相似性。实验结果表明，我们提出的SECOEXPAN框架的表现明显优于先前的方法。

Given a few seed entities of a certain type (e.g., Software or Programming Language), entity set expansion aims to discover an extensive set of entities that share the same type as the seeds. Entity set expansion in software-related domains such as StackOverflow can benefit many downstream tasks (e.g., software knowledge graph construction) and facilitate better IT operations and service management. Meanwhile, existing approaches are less concerned with two problems: (1) How to deal with multiple types of seed entities simultaneously? (2) How to leverage the power of pre-trained language models (PLMs)? Being aware of these two problems, in this paper, we study the entity set co-expansion task in StackOverflow, which extracts Library, OS, Application, and Language entities from StackOverflow question-answer threads. During the co-expansion process, we use PLMs to derive embeddings of candidate entities for calculating similarities between entities. Experimental results show that our proposed SECoExpan framework outperforms previous approaches significantly.

下载PDF全文

下载文献需遵守相关版权规定

论文标题