论文标题

Elle:有效的新兴数据终身预训练

ELLE: Efficient Lifelong Pre-training for Emerging Data

论文作者

Qin, Yujia, Zhang, Jiajie, Lin, Yankai, Liu, Zhiyuan, Li, Peng, Sun, Maosong, Zhou, Jie

论文摘要

当前的预训练语言模型(PLM)通常是通过静态数据训练的,忽略了在现实情况下,各种来源的流数据可能会不断增长。这要求PLM终生整合来自所有来源的信息。尽管可以通过在所有现有数据上进行详尽的预培训来实现此目标,但已知该过程在计算上是昂贵的。为此,我们提出了Elle,目的是为新兴数据有效终身预培训。具体而言,ELLE由(1)函数保留的模型扩展组成,它们灵活地扩展了现有的PLM的宽度和深度以提高知识获取的效率; (2)预先训练的域提示,它消除了在预训练期间学到的多功能知识,并刺激了下游任务的适当知识。我们通过来自BERT和GPT上5个域的流数据进行实验。结果表明,在训练效率和下游性能中,Elle比各种终身学习基线的优越性。这些代码可在https://github.com/thunlp/elle上公开获取。

Current pre-trained language models (PLM) are typically trained with static data, ignoring that in real-world scenarios, streaming data of various sources may continuously grow. This requires PLMs to integrate the information from all the sources in a lifelong manner. Although this goal could be achieved by exhaustive pre-training on all the existing data, such a process is known to be computationally expensive. To this end, we propose ELLE, aiming at efficient lifelong pre-training for emerging data. Specifically, ELLE consists of (1) function preserved model expansion, which flexibly expands an existing PLM's width and depth to improve the efficiency of knowledge acquisition; and (2) pre-trained domain prompts, which disentangle the versatile knowledge learned during pre-training and stimulate the proper knowledge for downstream tasks. We experiment ELLE with streaming data from 5 domains on BERT and GPT. The results show the superiority of ELLE over various lifelong learning baselines in both pre-training efficiency and downstream performances. The codes are publicly available at https://github.com/thunlp/ELLE.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源