论文标题

和新的一样好。如何成功回收英语GPT-2以制作其他语言的模型

As Good as New. How to Successfully Recycle English GPT-2 to Make Models for Other Languages

论文作者

de Vries, Wietse, Nissim, Malvina

论文摘要

大型生成语言模型对于英语而言非常成功,但其他语言却落后于数据和计算限制。我们提出了一种方法,可以通过将现有的预训练模型调整为新语言来克服这些问题。具体而言,我们通过在不调整变压器层的情况下对英语GPT-2对意大利语和荷兰语的适应性进行了描述。结果,我们获得了与原始英语词汇嵌入一致的意大利和荷兰语的词汇嵌入。此外,我们通过将gpt-2的重新学习词汇嵌入到GPT-2培养基嵌入空间中来扩大复杂性。这种方法可最大程度地减少训练量,并防止在GPT-2学到的适应过程中丢失信息。带有重新学习的词汇嵌入的英语GPT-2模型可以在意大利语和荷兰语中产生逼真的句子。尽管平均而言,这些句子仍然可以被人类识别为人工,但它们的评估是通过从头开始训练的GPT-2模型产生的句子。

Large generative language models have been very successful for English, but other languages lag behind, in part due to data and computational limitations. We propose a method that may overcome these problems by adapting existing pre-trained models to new languages. Specifically, we describe the adaptation of English GPT-2 to Italian and Dutch by retraining lexical embeddings without tuning the Transformer layers. As a result, we obtain lexical embeddings for Italian and Dutch that are aligned with the original English lexical embeddings. Additionally, we scale up complexity by transforming relearned lexical embeddings of GPT-2 small to the GPT-2 medium embedding space. This method minimises the amount of training and prevents losing information during adaptation that was learned by GPT-2. English GPT-2 models with relearned lexical embeddings can generate realistic sentences in Italian and Dutch. Though on average these sentences are still identifiable as artificial by humans, they are assessed on par with sentences generated by a GPT-2 model fully trained from scratch.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源