深度音频波形先验

论文标题

深度音频波形先验

Deep Audio Waveform Prior

论文作者

Turetzky, Arnon, Michelson, Tzvi, Adi, Yossi, Peleg, Shmuel

论文摘要

卷积神经网络包含强大的先验，用于产生自然的图像[1]。这些先验可以以无监督的方式实现图像deno，超级分辨率和灌输。先前尝试在音频中展示类似想法的尝试，即深度音频先验，（i）使用诸如谐波卷积之类的手挑选的体系结构，（ii）仅与频谱输入一起使用，并且（iii）主要用于消除高斯噪声[2]。在这项工作中，我们表明，即使使用原始波形，现有的音频源分离的SOTA体系结构也包含深度先验。可以通过训练神经网络来发现深度先验，以在将白噪声作为输入时产生单个损坏的信号。具有相关深度先验的网络可能会在损坏的信号收敛之前生成更清洁的信号版本。我们通过几种损坏证明了这种恢复效果：背景噪声，回响和信号差距（音频介绍）。

Convolutional neural networks contain strong priors for generating natural looking images [1]. These priors enable image denoising, super resolution, and inpainting in an unsupervised manner. Previous attempts to demonstrate similar ideas in audio, namely deep audio priors, (i) use hand picked architectures such as harmonic convolutions, (ii) only work with spectrogram input, and (iii) have been used mostly for eliminating Gaussian noise [2]. In this work we show that existing SOTA architectures for audio source separation contain deep priors even when working with the raw waveform. Deep priors can be discovered by training a neural network to generate a single corrupted signal when given white noise as input. A network with relevant deep priors is likely to generate a cleaner version of the signal before converging on the corrupted signal. We demonstrate this restoration effect with several corruptions: background noise, reverberations, and a gap in the signal (audio inpainting).

下载PDF全文

下载文献需遵守相关版权规定

论文标题