论文标题

SESQA:半监督语音质量评估的学习

SESQA: semi-supervised learning for speech quality assessment

论文作者

Serrà, Joan, Pons, Jordi, Pascual, Santiago

论文摘要

自动语音质量评估是一项重要的横向任务,它因人类注释的稀缺性,对看不见的记录条件的概括而受到阻碍,并且缺乏现有方法的灵活性。在这项工作中,我们通过半监督的学习方法解决了这些问题,将可用的注释与编程生成的数据相结合,并使用3个不同的优化标准以及5个互补的辅助任务。我们的结果表明,这种半监督的方法可以将现有方法的错误削减36%以上,同时在可重复使用的功能或辅助输出方面提供了其他好处。通过样本外测试,进一步证实了有希望的概括能力。

Automatic speech quality assessment is an important, transversal task whose progress is hampered by the scarcity of human annotations, poor generalization to unseen recording conditions, and a lack of flexibility of existing approaches. In this work, we tackle these problems with a semi-supervised learning approach, combining available annotations with programmatically generated data, and using 3 different optimization criteria together with 5 complementary auxiliary tasks. Our results show that such a semi-supervised approach can cut the error of existing methods by more than 36%, while providing additional benefits in terms of reusable features or auxiliary outputs. Improvement is further corroborated with an out-of-sample test showing promising generalization capabilities.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源