论文标题
Zevomos进入VoiceMos挑战2022
The ZevoMOS entry to VoiceMOS Challenge 2022
论文作者
论文摘要
本文将Zevomos的条目介绍给2022年Voicemos Challenge的主要轨道。Zevomos提交是基于预处理的自我监督学习(SSL)语音模型的两步填充。第一步使用了对自然语音和合成语音进行分类的任务,而第二步的任务是预测与每个训练样本相关的MOS分数。然后将填充过程的结果与从自动语音识别模型中提取的置信度得分以及从WAV2VEC SSL语音模型获得的训练样本的原始嵌入结合在一起。 在VoiceMos挑战中分配给Zevomos系统的团队ID是T01。关于系统级SRCC的提交被置于第14位,相对于发言级的MSE排名第9位。本文还介绍了中间结果的其他评估。
This paper introduces the ZevoMOS entry to the main track of the VoiceMOS Challenge 2022. The ZevoMOS submission is based on a two-step finetuning of pretrained self-supervised learning (SSL) speech models. The first step uses a task of classifying natural versus synthetic speech, while the second step's task is to predict the MOS scores associated with each training sample. The results of the finetuning process are then combined with the confidence scores extracted from an automatic speech recognition model, as well as the raw embeddings of the training samples obtained from a wav2vec SSL speech model. The team id assigned to the ZevoMOS system within the VoiceMOS Challenge is T01. The submission was placed on the 14th place with respect to the system-level SRCC, and on the 9th place with respect to the utterance-level MSE. The paper also introduces additional evaluations of the intermediate results.