Mu $^{2} $ slam：多任务，多语言语音和语言模型

论文标题

Mu $^{2} $ slam：多任务，多语言语音和语言模型

Mu$^{2}$SLAM: Multitask, Multilingual Speech and Language Models

论文作者

Cheng, Yong, Zhang, Yu, Johnson, Melvin, Macherey, Wolfgang, Bapna, Ankur

论文摘要

我们提出了Mu $^{2} $ SLAM，这是一种多语言序列到序列模型，在未标记的语音，未标记的文本和监督的数据中共同训练，跨越了自动语音识别（ASR），自动语音翻译（AST）和机器翻译（MT），以100多种语言使用。通过利用语音作为目标的量化表示，Mu $^{2} $ SLAM用序列到序列掩盖的剥落的deno目标来训练语音文本模型，类似于解码器的T5和掩盖的语言建模（MLM）在该模型上的目标（MLM），同时使用编码和文本进行了交流的模型，并在内部使用了交叉的模型。在COVOST AST上，Mu $^{2} $ SLAM为在公共数据集中训练的模型建立了新的最先进的模型，从而在上一个最佳的XX-EN翻译上提高了1.9个BLEU点，而EN-XX翻译在1.1 BLEU点上进行了改进。在Voxpopuli ASR上，尽管使用了相对较弱的序列到序列体系结构，但我们的模型与RNN-T解码器微调的MSLAM模型的性能相匹配。在理解任务的文本上，我们的模型在XNLI上的MSLAM超过6 \％，越来越接近XNLI和Tydiqa上可比容量的MT5模型的性能，为所有语音和文本理解任务的单一模型铺平了道路。

We present Mu$^{2}$SLAM, a multilingual sequence-to-sequence model pre-trained jointly on unlabeled speech, unlabeled text and supervised data spanning Automatic Speech Recognition (ASR), Automatic Speech Translation (AST) and Machine Translation (MT), in over 100 languages. By leveraging a quantized representation of speech as a target, Mu$^{2}$SLAM trains the speech-text models with a sequence-to-sequence masked denoising objective similar to T5 on the decoder and a masked language modeling (MLM) objective on the encoder, for both unlabeled speech and text, while utilizing the supervised tasks to improve cross-lingual and cross-modal representation alignment within the model. On CoVoST AST, Mu$^{2}$SLAM establishes a new state-of-the-art for models trained on public datasets, improving on xx-en translation over the previous best by 1.9 BLEU points and on en-xx translation by 1.1 BLEU points. On Voxpopuli ASR, our model matches the performance of an mSLAM model fine-tuned with an RNN-T decoder, despite using a relatively weaker sequence-to-sequence architecture. On text understanding tasks, our model improves by more than 6\% over mSLAM on XNLI, getting closer to the performance of mT5 models of comparable capacity on XNLI and TydiQA, paving the way towards a single model for all speech and text understanding tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题