SESQA：半监督语音质量评估的学习

论文标题

SESQA：半监督语音质量评估的学习

SESQA: semi-supervised learning for speech quality assessment

论文作者

Serrà, Joan, Pons, Jordi, Pascual, Santiago

论文摘要

自动语音质量评估是一项重要的横向任务，它因人类注释的稀缺性，对看不见的记录条件的概括而受到阻碍，并且缺乏现有方法的灵活性。在这项工作中，我们通过半监督的学习方法解决了这些问题，将可用的注释与编程生成的数据相结合，并使用3个不同的优化标准以及5个互补的辅助任务。我们的结果表明，这种半监督的方法可以将现有方法的错误削减36％以上，同时在可重复使用的功能或辅助输出方面提供了其他好处。通过样本外测试，进一步证实了有希望的概括能力。

Automatic speech quality assessment is an important, transversal task whose progress is hampered by the scarcity of human annotations, poor generalization to unseen recording conditions, and a lack of flexibility of existing approaches. In this work, we tackle these problems with a semi-supervised learning approach, combining available annotations with programmatically generated data, and using 3 different optimization criteria together with 5 complementary auxiliary tasks. Our results show that such a semi-supervised approach can cut the error of existing methods by more than 36%, while providing additional benefits in terms of reusable features or auxiliary outputs. Improvement is further corroborated with an out-of-sample test showing promising generalization capabilities.

下载PDF全文

下载文献需遵守相关版权规定

论文标题