论文标题
gatenlp-ushef在Semeval-2022任务8:富含实体的暹罗变压器多语言新闻文章相似性
GateNLP-UShef at SemEval-2022 Task 8: Entity-Enriched Siamese Transformer for Multilingual News Article Similarity
论文作者
论文摘要
本文介绍了Semeval-2022任务8:多语言新闻文章的相似性的第二位系统。我们提出了一个富含实体的暹罗变形金刚,该变压器根据不同的子维度(例如新闻文章中讨论的共享叙述,实体,位置和时间的共享叙述,实体,位置和时间)计算新闻文章的相似性。我们的系统使用变压器编码器利用暹罗网络体系结构来学习文档级表示,以捕获叙事以及从新闻文章中提取的基于辅助实体的功能。将所有这些功能一起使用背后的直觉是捕获不同粒度水平的新闻文章之间的相似性,并评估不同新闻媒体在多大程度上写的关于“同一事件”的程度。我们的实验结果和详细的消融研究证明了我们提出的方法的有效性和有效性。
This paper describes the second-placed system on the leaderboard of SemEval-2022 Task 8: Multilingual News Article Similarity. We propose an entity-enriched Siamese Transformer which computes news article similarity based on different sub-dimensions, such as the shared narrative, entities, location and time of the event discussed in the news article. Our system exploits a Siamese network architecture using a Transformer encoder to learn document-level representations for the purpose of capturing the narrative together with the auxiliary entity-based features extracted from the news articles. The intuition behind using all these features together is to capture the similarity between news articles at different granularity levels and to assess the extent to which different news outlets write about "the same events". Our experimental results and detailed ablation study demonstrate the effectiveness and the validity of our proposed method.