多语言deNo为神经机器翻译的预训练

论文标题

多语言deNo为神经机器翻译的预训练

Multilingual Denoising Pre-training for Neural Machine Translation

论文作者

Liu, Yinhan, Gu, Jiatao, Goyal, Naman, Li, Xian, Edunov, Sergey, Ghazvininejad, Marjan, Lewis, Mike, Zettlemoyer, Luke

论文摘要

本文表明，多语言deNo进行预训练会在各种机器翻译（MT）任务中产生显着的性能增长。我们提出了Mbart-使用BART目标，以许多语言对大规模单语库进行预训练的自动编码器的序列对序列进行了序列。 MBART是通过以多种语言来确定完整文本的完整序列模型预训练的第一个方法之一，而以前的方法仅专注于编码器，解码器或重建文本的部分。预先培训完整的模型可以直接对其进行细调，以进行监督（句子级别和文档级别和文档级别）和无监督的机器翻译，而没有特定于任务的修改。我们证明，除了最高的资源设置外，添加MBART初始化可产生绩效提高，包括低资源MT的最多12个BLEU点以及许多文档级别和无监督模型的5个BLEU点。我们还表明，它还可以使新类型的转移到没有双文本或不在培训前语料库中的语言对中，并且目前对哪些因素对有效的预训练有最大的作用。

This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART -- a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective. mBART is one of the first methods for pre-training a complete sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only on the encoder, decoder, or reconstructing parts of the text. Pre-training a complete model allows it to be directly fine tuned for supervised (both sentence-level and document-level) and unsupervised machine translation, with no task-specific modifications. We demonstrate that adding mBART initialization produces performance gains in all but the highest-resource settings, including up to 12 BLEU points for low resource MT and over 5 BLEU points for many document-level and unsupervised models. We also show it also enables new types of transfer to language pairs with no bi-text or that were not in the pre-training corpus, and present extensive analysis of which factors contribute the most to effective pre-training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题