论文标题
推动半监督学习的限制以自动语音识别
Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition
论文作者
论文摘要
我们利用Libri-Light数据集的未标记的音频来获得半监督学习中最新的发展的最新发展,以获得自动语音识别的最新结果。更确切地说,我们使用使用WAV2VEC 2.0预训练的巨型构象模型进行了嘈杂的学生培训,并使用巨型构象模型进行了训练。通过这样做,我们能够在LibrisPeech测试/测试中获得1.4%/2.6%的单词率速率(WERS),而目前是最新的1.7%/3.3%。
We employ a combination of recent developments in semi-supervised learning for automatic speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled audio of the Libri-Light dataset. More precisely, we carry out noisy student training with SpecAugment using giant Conformer models pre-trained using wav2vec 2.0 pre-training. By doing so, we are able to achieve word-error-rates (WERs) 1.4%/2.6% on the LibriSpeech test/test-other sets against the current state-of-the-art WERs 1.7%/3.3%.