语言敏捷的代码转换在序列到序列语音识别中

论文标题

语言敏捷的代码转换在序列到序列语音识别中

Language-agnostic Code-Switching in Sequence-To-Sequence Speech Recognition

论文作者

Ugan, Enes Yavuz, Huber, Christian, Hussain, Juan, Waibel, Alexander

论文摘要

使用来自不同语言的单词和短语，代码转换（CS）交替地转称。尽管当今的神经端到端（E2E）模型在自动语音识别任务（ASR）方面提供了最先进的表演，但众所周知，这些系统非常数据密集型。但是，只有少数转录和对齐的CS语音可用。为了克服这个问题并训练可以转录CS语音的多语言系统，我们提出了一个简单而有效的数据增强，其中音频和相应的不同源语言的标签被串联。通过使用此培训数据，我们的E2E模型可以改善转录CS语音。它还超过了单语测试中的单语模型。结果表明，这种增强技术甚至可以提高模型在训练期间未见的句子间语言转换上的性能，而不是5,03％。

Code-Switching (CS) is referred to the phenomenon of alternately using words and phrases from different languages. While today's neural end-to-end (E2E) models deliver state-of-the-art performances on the task of automatic speech recognition (ASR) it is commonly known that these systems are very data-intensive. However, there is only a few transcribed and aligned CS speech available. To overcome this problem and train multilingual systems which can transcribe CS speech, we propose a simple yet effective data augmentation in which audio and corresponding labels of different source languages are concatenated. By using this training data, our E2E model improves on transcribing CS speech. It also surpasses monolingual models on monolingual tests. The results show that this augmentation technique can even improve the model's performance on inter-sentential language switches not seen during training by 5,03% WER.

下载PDF全文

下载文献需遵守相关版权规定

论文标题