语言模型的时间关注

论文标题

语言模型的时间关注

Temporal Attention for Language Models

论文作者

Rosin, Guy D., Radinsky, Kira

论文摘要

基于变压器体系结构的验证语言模型在NLP中取得了巨大的成功。文本培训数据通常来自网络，因此用特定时间的信息标记，但是大多数语言模型都忽略了此信息。他们仅对文本数据进行了培训，从而限制了他们延续时间的能力。在这项工作中，我们扩展了变压器架构的关键组成部分，即自我注意的机制，并提出时间引起时间的关注 - 时间感知的自我注意机制。时间关注可以应用于任何变压器模型，并要求输入文本伴随其相关时间点。它允许变压器捕获此时间信息并创建特定于时间的上下文化单词表示。我们利用这些表示的语义变化检测任务；我们将提出的机制应用于不同语言（英语，德语和拉丁语）的三个数据集上的BERT和实验，这些数据集也随时间，大小和流派而变化。我们提出的模型在所有数据集上实现了最新的结果。

Pretrained language models based on the transformer architecture have shown great success in NLP. Textual training data often comes from the web and is thus tagged with time-specific information, but most language models ignore this information. They are trained on the textual data alone, limiting their ability to generalize temporally. In this work, we extend the key component of the transformer architecture, i.e., the self-attention mechanism, and propose temporal attention - a time-aware self-attention mechanism. Temporal attention can be applied to any transformer model and requires the input texts to be accompanied with their relevant time points. It allows the transformer to capture this temporal information and create time-specific contextualized word representations. We leverage these representations for the task of semantic change detection; we apply our proposed mechanism to BERT and experiment on three datasets in different languages (English, German, and Latin) that also vary in time, size, and genre. Our proposed model achieves state-of-the-art results on all the datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题