语音超分辨率的各种扩散模型的调节和采样

论文标题

语音超分辨率的各种扩散模型的调节和采样

Conditioning and Sampling in Variational Diffusion Models for Speech Super-Resolution

论文作者

Yu, Chin-Yun, Yeh, Sung-Lin, Fazekas, György, Tang, Hao

论文摘要

最近，扩散模型（DMS）已越来越多地用于音频处理任务，包括语音超分辨率（SR），旨在鉴于低分辨率的语音话语，旨在恢复高频内容。这通常是通过用低分辨率音频调节噪声预测器网络来实现的。在本文中，我们提出了一种新型的采样算法，该算法通过DMS的反向采样过程传达低分辨率音频的信息。提出的方法可以是香草采样过程的替代方法，并且可以显着提高现有作品的性能。此外，通过将提出的采样方法与无条件的DM耦合，即没有辅助输入的DM与其噪声预测变量，我们可以将其推广到广泛的SR设置。我们还使用这种新颖的配方在VCTK多演讲者基准上获得了最先进的结果。

Recently, diffusion models (DMs) have been increasingly used in audio processing tasks, including speech super-resolution (SR), which aims to restore high-frequency content given low-resolution speech utterances. This is commonly achieved by conditioning the network of noise predictor with low-resolution audio. In this paper, we propose a novel sampling algorithm that communicates the information of the low-resolution audio via the reverse sampling process of DMs. The proposed method can be a drop-in replacement for the vanilla sampling process and can significantly improve the performance of the existing works. Moreover, by coupling the proposed sampling method with an unconditional DM, i.e., a DM with no auxiliary inputs to its noise predictor, we can generalize it to a wide range of SR setups. We also attain state-of-the-art results on the VCTK Multi-Speaker benchmark with this novel formulation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题