基于算术的预读 - 改善审计语言模型的算术

论文标题

基于算术的预读 - 改善审计语言模型的算术

Arithmetic-Based Pretraining -- Improving Numeracy of Pretrained Language Models

论文作者

Petrak, Dominic, Moosavi, Nafise Sadat, Gurevych, Iryna

论文摘要

当在需要理解和使用数字工作的任务上应用开箱即用时，最先进的语言模型倾向于低于其功能。最近的工作提出了两个主要原因：（1）流行的令牌化算法对数字的表现力有限，并且（2）常见的预处理目标不是针对算术的。解决这些缺点的方法通常需要从头开始进行架构变化或预处理。在本文中，我们提出了一种称为基于算术的预处理的新的扩展预处理方法，该方法在一个扩展的预审进步骤中共同解决了这两种方法，而无需从头开始进行体系结构变化或预处理。基于算术的预处理结合了对比度学习以改善数量表示形式，并且一个新颖的扩展训练预处理目标，称为可推断数量预测任务，以提高计算能力。我们的实验表明，基于算术预处理的有效性在三个不同的任务中需要改进的算术，即，在Drop数据集中阅读理解，Infotabs数据集中的推理表中的推理表，以及Wikibio和Scigen Dataset中的表格对文本生成。

State-of-the-art pretrained language models tend to perform below their capabilities when applied out-of-the-box on tasks that require understanding and working with numbers. Recent work suggests two main reasons for this: (1) popular tokenisation algorithms have limited expressiveness for numbers, and (2) common pretraining objectives do not target numeracy. Approaches that address these shortcomings usually require architectural changes or pretraining from scratch. In this paper, we propose a new extended pretraining approach called Arithmetic-Based Pretraining that jointly addresses both in one extended pretraining step without requiring architectural changes or pretraining from scratch. Arithmetic-Based Pretraining combines contrastive learning to improve the number representation, and a novel extended pretraining objective called Inferable Number Prediction Task to improve numeracy. Our experiments show the effectiveness of Arithmetic-Based Pretraining in three different tasks that require improved numeracy, i.e., reading comprehension in the DROP dataset, inference-on-tables in the InfoTabs dataset, and table-to-text generation in the WikiBio and SciGen datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题