Musem：使用相互细心的语义匹配来检测不一致的新闻头条

论文标题

Musem：使用相互细心的语义匹配来检测不一致的新闻头条

MuSeM: Detecting Incongruent News Headlines using Mutual Attentive Semantic Matching

论文作者

Mishra, Rahul, Yadav, Piyush, Calizzano, Remi, Leippold, Markus

论文摘要

测量两个文本之间的一致性具有几个有用的应用程序，例如检测网络上普遍存在的欺骗性和误导性新闻头条。许多作品都提出了基于机器学习的解决方案，例如标题和身体文本之间的文本相似性，以检测不一致。由于不同的固有挑战，例如新闻标题及其身体内容与非重叠词汇之间的相对长度不匹配，因此基于文本相似性的方法无法表现良好。另一方面，使用标题引导注意的新闻来学习新闻机构的上下文表示标题的最新作品也导致由于新闻机构的冗长性而导致整体代表。本文提出了一种使用原始生成的标题和合成生成的头条之间使用基于多重注意的语义匹配的方法，该方法利用了所涉及的单词嵌入的所有对单词嵌入之间的差异。本文还研究了我们方法的另外两种变体，这些变体使用了原始和合成头条的单词嵌入的串联和点产物。我们观察到，对于两个公开可用的数据集，所提出的方法的表现明显优于先前的艺术。

Measuring the congruence between two texts has several useful applications, such as detecting the prevalent deceptive and misleading news headlines on the web. Many works have proposed machine learning based solutions such as text similarity between the headline and body text to detect the incongruence. Text similarity based methods fail to perform well due to different inherent challenges such as relative length mismatch between the news headline and its body content and non-overlapping vocabulary. On the other hand, more recent works that use headline guided attention to learn a headline derived contextual representation of the news body also result in convoluting overall representation due to the news body's lengthiness. This paper proposes a method that uses inter-mutual attention-based semantic matching between the original and synthetically generated headlines, which utilizes the difference between all pairs of word embeddings of words involved. The paper also investigates two more variations of our method, which use concatenation and dot-products of word embeddings of the words of original and synthetic headlines. We observe that the proposed method outperforms prior arts significantly for two publicly available datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题