推动半监督学习的限制以自动语音识别

论文标题

推动半监督学习的限制以自动语音识别

Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition

论文作者

Zhang, Yu, Qin, James, Park, Daniel S., Han, Wei, Chiu, Chung-Cheng, Pang, Ruoming, Le, Quoc V., Wu, Yonghui

论文摘要

我们利用Libri-Light数据集的未标记的音频来获得半监督学习中最新的发展的最新发展，以获得自动语音识别的最新结果。更确切地说，我们使用使用WAV2VEC 2.0预训练的巨型构象模型进行了嘈杂的学生培训，并使用巨型构象模型进行了训练。通过这样做，我们能够在LibrisPeech测试/测试中获得1.4％/2.6％的单词率速率（WERS），而目前是最新的1.7％/3.3％。

We employ a combination of recent developments in semi-supervised learning for automatic speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled audio of the Libri-Light dataset. More precisely, we carry out noisy student training with SpecAugment using giant Conformer models pre-trained using wav2vec 2.0 pre-training. By doing so, we are able to achieve word-error-rates (WERs) 1.4%/2.6% on the LibriSpeech test/test-other sets against the current state-of-the-art WERs 1.7%/3.3%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题