论文标题
语音超分辨率的各种扩散模型的调节和采样
Conditioning and Sampling in Variational Diffusion Models for Speech Super-Resolution
论文作者
论文摘要
最近,扩散模型(DMS)已越来越多地用于音频处理任务,包括语音超分辨率(SR),旨在鉴于低分辨率的语音话语,旨在恢复高频内容。这通常是通过用低分辨率音频调节噪声预测器网络来实现的。在本文中,我们提出了一种新型的采样算法,该算法通过DMS的反向采样过程传达低分辨率音频的信息。提出的方法可以是香草采样过程的替代方法,并且可以显着提高现有作品的性能。此外,通过将提出的采样方法与无条件的DM耦合,即没有辅助输入的DM与其噪声预测变量,我们可以将其推广到广泛的SR设置。我们还使用这种新颖的配方在VCTK多演讲者基准上获得了最先进的结果。
Recently, diffusion models (DMs) have been increasingly used in audio processing tasks, including speech super-resolution (SR), which aims to restore high-frequency content given low-resolution speech utterances. This is commonly achieved by conditioning the network of noise predictor with low-resolution audio. In this paper, we propose a novel sampling algorithm that communicates the information of the low-resolution audio via the reverse sampling process of DMs. The proposed method can be a drop-in replacement for the vanilla sampling process and can significantly improve the performance of the existing works. Moreover, by coupling the proposed sampling method with an unconditional DM, i.e., a DM with no auxiliary inputs to its noise predictor, we can generalize it to a wide range of SR setups. We also attain state-of-the-art results on the VCTK Multi-Speaker benchmark with this novel formulation.