探索语音增强的WAVLM

论文标题

探索语音增强的WAVLM

Exploring WavLM on Speech Enhancement

论文作者

Song, Hyungchan, Chen, Sanyuan, Chen, Zhuo, Wu, Yu, Yoshioka, Takuya, Tang, Min, Shin, Jong Won, Liu, Shujie

论文摘要

近年来，他们取得了巨大的成功，人们对端到端的语音编码的自我监督学习方法引起了人们的兴趣。尤其是，WAVLM在各种语音处理任务上表现出最新的表现。为了更好地了解自我监督学习模型的语音增强功效，在这项工作中，我们通过结合WAVLM和两个高质量的语音增强系统来设计和进行一系列具有三个资源条件的实验。另外，我们提出了一个基于回归的WAVLM训练目标和混合噪声数据配置，以进一步提高下游增强性能。 DNS挑战数据集和模拟数据集的实验表明，WAVLM从语音质量和语音识别准确性方面受益于语音增强任务，尤其是对于低调的微调资源。对于高微调资源条件，只有单词错误率得到显着提高。

There is a surge in interest in self-supervised learning approaches for end-to-end speech encoding in recent years as they have achieved great success. Especially, WavLM showed state-of-the-art performance on various speech processing tasks. To better understand the efficacy of self-supervised learning models for speech enhancement, in this work, we design and conduct a series of experiments with three resource conditions by combining WavLM and two high-quality speech enhancement systems. Also, we propose a regression-based WavLM training objective and a noise-mixing data configuration to further boost the downstream enhancement performance. The experiments on the DNS challenge dataset and a simulation dataset show that the WavLM benefits the speech enhancement task in terms of both speech quality and speech recognition accuracy, especially for low fine-tuning resources. For the high fine-tuning resource condition, only the word error rate is substantially improved.

下载PDF全文

下载文献需遵守相关版权规定

论文标题