论文标题

使用人的阅读时间从微博来增强键形的键形

Enhancing Keyphrase Extraction from Microblogs using Human Reading Time

论文作者

Zhang, Yingyi, Zhang, Chengzhi

论文摘要

手动键形注释的前提是读取注释对象的相应内容。直观地,当我们阅读时,更重要的单词将占据更长的阅读时间。因此,通过利用人类的阅读时间,我们可以在相应的内容中找到显着词。但是,先前关于键形提取的研究忽略了人类阅读特征。在本文中,我们旨在利用人类阅读时间从微博帖子中提取键形。这项研究有两个主要任务。一种是确定如何衡量人类在阅读单词上花费的时间。我们使用从开源目光跟踪语料库(OSEC)提取的眼睛固定持续时间。此外,我们提出了使眼睛固定持续时间更有效地提取键形的策略。另一个任务是确定如何将人类阅读时间整合到键形提取模型中。我们提出了两个新型的神经网络模型。第一个是一种模型,其中人的阅读时间被用作注意机制的基础真理。在第二个模型中,我们将人类阅读时间用作外部功能。定量和定性实验表明,与两个微博数据集上的基线模型相比,我们提出的模型产生的性能更好。

The premise of manual keyphrase annotation is to read the corresponding content of an annotated object. Intuitively, when we read, more important words will occupy a longer reading time. Hence, by leveraging human reading time, we can find the salient words in the corresponding content. However, previous studies on keyphrase extraction ignore human reading features. In this article, we aim to leverage human reading time to extract keyphrases from microblog posts. There are two main tasks in this study. One is to determine how to measure the time spent by a human on reading a word. We use eye fixation durations extracted from an open source eye-tracking corpus (OSEC). Moreover, we propose strategies to make eye fixation duration more effective on keyphrase extraction. The other task is to determine how to integrate human reading time into keyphrase extraction models. We propose two novel neural network models. The first is a model in which the human reading time is used as the ground truth of the attention mechanism. In the second model, we use human reading time as the external feature. Quantitative and qualitative experiments show that our proposed models yield better performance than the baseline models on two microblog datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源