论文标题
深层多帧MVDR滤波,用于增强单微粒语音
Deep Multi-Frame MVDR Filtering for Single-Microphone Speech Enhancement
论文作者
论文摘要
单晶状体语音增强功能的多帧算法,例如,多帧最小值无失真响应(MFMVDR)滤波器能够利用短时傅立叶变换(STFT)域中相邻时间范围的跨相邻时间框架的语音相关性。只要可以使用所需的语音间相关矢量和噪声相关矩阵的准确估计值,已显示MFMVDR滤波器会产生大量降低噪声,同时几乎没有引入任何语音失真。旨在合并MFMVDR滤波器的语音增强潜力以及时间卷积网络(TCN)的估计能力,在本文中,我们建议将MFMVDR过滤器嵌入深度学习框架中。通过将MFMVDR滤波器输出处的量表不变的信噪比损耗函数最小化,将TCN训练以将噪声语音STFT系数映射到所需数量。实验结果表明,所提出的深MFMVDR滤波器在深噪声抑制挑战数据集上实现了竞争性的语音增强性能。特别是,结果表明,与直接估计多帧过滤器或单帧掩码相比,估计MFMVDR滤波器的参数在PESQ和STOI方面产生的性能更高。
Multi-frame algorithms for single-microphone speech enhancement, e.g., the multi-frame minimum variance distortionless response (MFMVDR) filter, are able to exploit speech correlation across adjacent time frames in the short-time Fourier transform (STFT) domain. Provided that accurate estimates of the required speech interframe correlation vector and the noise correlation matrix are available, it has been shown that the MFMVDR filter yields a substantial noise reduction while hardly introducing any speech distortion. Aiming at merging the speech enhancement potential of the MFMVDR filter and the estimation capability of temporal convolutional networks (TCNs), in this paper we propose to embed the MFMVDR filter within a deep learning framework. The TCNs are trained to map the noisy speech STFT coefficients to the required quantities by minimizing the scale-invariant signal-to-distortion ratio loss function at the MFMVDR filter output. Experimental results show that the proposed deep MFMVDR filter achieves a competitive speech enhancement performance on the Deep Noise Suppression Challenge dataset. In particular, the results show that estimating the parameters of an MFMVDR filter yields a higher performance in terms of PESQ and STOI than directly estimating the multi-frame filter or single-frame masks and than Conv-TasNet.