论文标题

促使文本生成的时间意识

Time-aware Prompting for Text Generation

论文作者

Cao, Shuyang, Wang, Lu

论文摘要

在本文中,我们研究了将时间戳(例如文档创建日期)纳入生成系统的影响。研究了两种类型的时间注意提示:(1)用自然语言句子编码文档时间戳的文本提示; (2)线性提示将时间戳转换为连续向量。为了探索推出未来的数据点,我们进一步引入了一个新的数据到文本生成数据集,即tempwikibio,其中包含超过400万的英语wikipedia传记文章的按时间顺序排列的修订,每个文章都与结构化的个人资料配对。通过tempwikibio上的数据到文本生成,内容传输数据集上的文本到文本生成以及XSUM上的摘要,我们显示在编码器上的线性提示和文本提示提示提高了所有数据集的生成质量。根据人类评估和敏感性分析,尽管在以后的时间中测试从后来的时间中测试数据时的性能下降较少,但线性提示更多地集中在非时空信息上,并且对给定时间戳较少敏感。同时,文本提示建立给定时间戳和输出日期之间的关联,从而在输出中产生更多事实的时间信息。

In this paper, we study the effects of incorporating timestamps, such as document creation dates, into generation systems. Two types of time-aware prompts are investigated: (1) textual prompts that encode document timestamps in natural language sentences; and (2) linear prompts that convert timestamps into continuous vectors. To explore extrapolation to future data points, we further introduce a new data-to-text generation dataset, TempWikiBio, containing more than 4 millions of chronologically ordered revisions of biographical articles from English Wikipedia, each paired with structured personal profiles. Through data-to-text generation on TempWikiBio, text-to-text generation on the content transfer dataset, and summarization on XSum, we show that linear prompts on encoder and textual prompts improve the generation quality on all datasets. Despite having less performance drop when testing on data drawn from a later time, linear prompts focus more on non-temporal information and are less sensitive to the given timestamps, according to human evaluations and sensitivity analyses. Meanwhile, textual prompts establish the association between the given timestamps and the output dates, yielding more factual temporal information in the output.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源