论文标题

Slimipl:无语言模式的迭代伪标记

SlimIPL: Language-Model-Free Iterative Pseudo-Labeling

论文作者

Likhomanenko, Tatiana, Xu, Qiantong, Kahn, Jacob, Synnaeve, Gabriel, Collobert, Ronan

论文摘要

端到端自动语音识别的最新结果证明了伪标记对通过连接派时间分类(CTC)和序列到序列(SEQ2SEQ)损失的半监督模型的效力。迭代伪标记(IPL)在模型学习的过程中使用伪标签迭代地重新生成的单个模型不断训练单个模型,已被证明可以进一步提高ASR的性能。我们对IPL算法进行了改进:随着模型学习的学习,我们建议使用硬标签(最可能的令牌)重新生成转录,即没有语言模型。我们将这种方法称为无语言模型IPL(SLIMIPL),并通过基于CTC的模型为低资源设置提供了由此产生的培训设置。 Slimipl具有用于伪标签的动态缓存,可降低对重新标记超参数变化的敏感性,并改善训练稳定性。 Slimipl也是高效的,并且比其他最先进的半/自我监督方法要融入3.5-4x的计算资源。 Slimipl只有10个小时的标签音频,具有自我监督的方法的竞争力,并且是最先进的,并具有100个小时的标签音频,而无需在测试时和伪标签生成期间使用语言模型。

Recent results in end-to-end automatic speech recognition have demonstrated the efficacy of pseudo-labeling for semi-supervised models trained both with Connectionist Temporal Classification (CTC) and Sequence-to-Sequence (seq2seq) losses. Iterative Pseudo-Labeling (IPL), which continuously trains a single model using pseudo-labels iteratively re-generated as the model learns, has been shown to further improve performance in ASR. We improve upon the IPL algorithm: as the model learns, we propose to iteratively re-generate transcriptions with hard labels (the most probable tokens), that is, without a language model. We call this approach Language-Model-Free IPL (slimIPL) and give a resultant training setup for low-resource settings with CTC-based models. slimIPL features a dynamic cache for pseudo-labels which reduces sensitivity to changes in relabeling hyperparameters and results in improves training stability. slimIPL is also highly-efficient and requires 3.5-4x fewer computational resources to converge than other state-of-the-art semi/self-supervised approaches. With only 10 hours of labeled audio, slimIPL is competitive with self-supervised approaches, and is state-of-the-art with 100 hours of labeled audio without the use of a language model both at test time and during pseudo-label generation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源