论文标题
mmtafrica:非洲语言的多语言机器翻译
MMTAfrica: Multilingual Machine Translation for African Languages
论文作者
论文摘要
在本文中,我们专注于非洲语言的多语言机器翻译任务,并描述我们在2021 WMT共享任务中的贡献:大型多语言机器翻译。我们介绍了Mmtafrica,这是第一个针对六种非洲语言的多语言翻译系统:FON(FON),IGBO(IBO),Kinyarwanda(Kin),Swahili/Swahili/Kiswahili(swa),Xhosa(Xho),Xho(Xho)和Yoruba(Yoruba(Yoruba(Yorba)(Yoruba(Yoruba))和两种非法语(Frrivrican和英文)和(英文)(frricanages)和(英语)。对于有关非洲语言的多语言翻译,我们引入了一种新颖的反射和重建目标,BT \&REC,分别受到随机在线背部翻译和T5建模框架的启发,以有效利用单语言数据。此外,我们报告说,MMTafrica对弗洛雷斯(Flores)的101个基准有所改善(Spbleu收益从$++0.58美元的Swahili到法语到$+19.46美元的法语到Xhosa)。我们在https://github.com/edaiofficial/mmtafrica上发布数据集和代码源。
In this paper, we focus on the task of multilingual machine translation for African languages and describe our contribution in the 2021 WMT Shared Task: Large-Scale Multilingual Machine Translation. We introduce MMTAfrica, the first many-to-many multilingual translation system for six African languages: Fon (fon), Igbo (ibo), Kinyarwanda (kin), Swahili/Kiswahili (swa), Xhosa (xho), and Yoruba (yor) and two non-African languages: English (eng) and French (fra). For multilingual translation concerning African languages, we introduce a novel backtranslation and reconstruction objective, BT\&REC, inspired by the random online back translation and T5 modeling framework respectively, to effectively leverage monolingual data. Additionally, we report improvements from MMTAfrica over the FLORES 101 benchmarks (spBLEU gains ranging from $+0.58$ in Swahili to French to $+19.46$ in French to Xhosa). We release our dataset and code source at https://github.com/edaiofficial/mmtafrica.