Elle：有效的新兴数据终身预训练

论文标题

Elle：有效的新兴数据终身预训练

ELLE: Efficient Lifelong Pre-training for Emerging Data

论文作者

Qin, Yujia, Zhang, Jiajie, Lin, Yankai, Liu, Zhiyuan, Li, Peng, Sun, Maosong, Zhou, Jie

论文摘要

当前的预训练语言模型（PLM）通常是通过静态数据训练的，忽略了在现实情况下，各种来源的流数据可能会不断增长。这要求PLM终生整合来自所有来源的信息。尽管可以通过在所有现有数据上进行详尽的预培训来实现此目标，但已知该过程在计算上是昂贵的。为此，我们提出了Elle，目的是为新兴数据有效终身预培训。具体而言，ELLE由（1）函数保留的模型扩展组成，它们灵活地扩展了现有的PLM的宽度和深度以提高知识获取的效率；（2）预先训练的域提示，它消除了在预训练期间学到的多功能知识，并刺激了下游任务的适当知识。我们通过来自BERT和GPT上5个域的流数据进行实验。结果表明，在训练效率和下游性能中，Elle比各种终身学习基线的优越性。这些代码可在https://github.com/thunlp/elle上公开获取。

Current pre-trained language models (PLM) are typically trained with static data, ignoring that in real-world scenarios, streaming data of various sources may continuously grow. This requires PLMs to integrate the information from all the sources in a lifelong manner. Although this goal could be achieved by exhaustive pre-training on all the existing data, such a process is known to be computationally expensive. To this end, we propose ELLE, aiming at efficient lifelong pre-training for emerging data. Specifically, ELLE consists of (1) function preserved model expansion, which flexibly expands an existing PLM's width and depth to improve the efficiency of knowledge acquisition; and (2) pre-trained domain prompts, which disentangle the versatile knowledge learned during pre-training and stimulate the proper knowledge for downstream tasks. We experiment ELLE with streaming data from 5 domains on BERT and GPT. The results show the superiority of ELLE over various lifelong learning baselines in both pre-training efficiency and downstream performances. The codes are publicly available at https://github.com/thunlp/ELLE.

下载PDF全文

下载文献需遵守相关版权规定

论文标题