从英语到外语：转移预训练的语言模型

论文标题

从英语到外语：转移预训练的语言模型

From English To Foreign Languages: Transferring Pre-trained Language Models

论文作者

Tran, Ke

论文摘要

预训练的模型已经证明了它们在许多下游自然语言处理（NLP）任务中的有效性。多语言预训练的模型的可用性使NLP任务从高资源语言转移到低资源。但是，改进预训练模型的最新研究重点关注英语。虽然可以从头开始训练其他语言的最新神经体系结构，但由于所需的计算量，它是不可取的。在这项工作中，我们解决了在有限的计算预算下将现有的预培训模型从英语转移到其他语言的问题。使用单个GPU，我们的方法可以在一天之内获得外国BERT基本模型，而在两天内可以获得大型外国伯特。此外，在六种语言上评估我们的模型，我们证明了我们的模型比在两个零射任务上的多语言伯特更好：自然语言推论和依赖性解析。

Pre-trained models have demonstrated their effectiveness in many downstream natural language processing (NLP) tasks. The availability of multilingual pre-trained models enables zero-shot transfer of NLP tasks from high resource languages to low resource ones. However, recent research in improving pre-trained models focuses heavily on English. While it is possible to train the latest neural architectures for other languages from scratch, it is undesirable due to the required amount of compute. In this work, we tackle the problem of transferring an existing pre-trained model from English to other languages under a limited computational budget. With a single GPU, our approach can obtain a foreign BERT base model within a day and a foreign BERT large within two days. Furthermore, evaluating our models on six languages, we demonstrate that our models are better than multilingual BERT on two zero-shot tasks: natural language inference and dependency parsing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题