LEGO-MT：学习大量多语言机器翻译的可拆卸模型

论文标题

LEGO-MT：学习大量多语言机器翻译的可拆卸模型

Lego-MT: Learning Detachable Models for Massively Multilingual Machine Translation

论文作者

Yuan, Fei, Lu, Yinquan, Zhu, WenHao, Kong, Lingpeng, Li, Lei, Qiao, Yu, Xu, Jingjing

论文摘要

多语言神经机器翻译（MNMT）旨在为许多语言方向建立统一的模型。现有的MNMT单片模型遇到了两个挑战：语言之间的参数干扰和大型模型的效率低下。在本文中，我们通过将每种语言（或一组语言）分配给支持插入式培训和推理的单个分支来重新访问经典的多道路结构，并开发可拆卸的模型。为了满足统一空间中所有语言的学习表征需求，我们提出了一种新颖的有效培训配方，我们在该食谱上建立了一个有效的可拆卸模型LEGO-MT。为了进行公平的比较，我们从Opus收集数据，并构建一个涵盖433种语言和1.3B并行数据的翻译基准。实验表明，具有1.2b参数的乐高-MT的平均增益为3.2 SpbleU。它甚至超过12B参数的M2M-100。拟议的培训食谱带来了28.2 $ \ times $ $加速。

Multilingual neural machine translation (MNMT) aims to build a unified model for many language directions. Existing monolithic models for MNMT encounter two challenges: parameter interference among languages and inefficient inference for large models. In this paper, we revisit the classic multi-way structures and develop a detachable model by assigning each language (or group of languages) to an individual branch that supports plug-and-play training and inference. To address the needs of learning representations for all languages in a unified space, we propose a novel efficient training recipe, upon which we build an effective detachable model, Lego-MT. For a fair comparison, we collect data from OPUS and build a translation benchmark covering 433 languages and 1.3B parallel data. Experiments show that Lego-MT with 1.2B parameters brings an average gain of 3.2 spBLEU. It even outperforms M2M-100 with 12B parameters. The proposed training recipe brings a 28.2$\times$ speedup over the conventional multi-way training method.\footnote{ \url{https://github.com/CONE-MT/Lego-MT}.}

下载PDF全文

下载文献需遵守相关版权规定

论文标题