FullSubnet：实时单渠道语音增强的全频段和子融合模型

论文标题

FullSubnet：实时单渠道语音增强的全频段和子融合模型

FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement

论文作者

Hao, Xiang, Su, Xiangdong, Horaud, Radu, Li, Xiaofei

论文摘要

本文提出了一个全频段和子融合模型，称为FullSubnet，用于单渠道实时语音增强。全频段和子频段是指分别输入全频段和子带噪声特征，输出频带和子带语音目标的模型。子带模型独立处理每个频率。它的输入由一个频率和几个上下文频率组成。输出是对相应频率的干净语音目标的预测。这两种模型具有不同的特征。全频段模型可以捕获全球光谱上下文和长距离跨波段依赖性。但是，它缺乏对信号平稳性建模和参加局部光谱模式的能力。子频段模型恰恰相反。在我们提出的FullSubnet中，我们将纯净的全频段模型和纯副频段模型连接起来，并使用实用的联合训练来整合这两种模型的优势。我们对DNS挑战（Interspeech 2020）数据集进行了实验，以评估所提出的方法。实验结果表明，全频段和子频段信息是互补的，全subnet可以有效地整合它们。此外，FullSubNet的性能还超过了DNS挑战中最高的方法（Interspeech 2020）。

This paper proposes a full-band and sub-band fusion model, named as FullSubNet, for single-channel real-time speech enhancement. Full-band and sub-band refer to the models that input full-band and sub-band noisy spectral feature, output full-band and sub-band speech target, respectively. The sub-band model processes each frequency independently. Its input consists of one frequency and several context frequencies. The output is the prediction of the clean speech target for the corresponding frequency. These two types of models have distinct characteristics. The full-band model can capture the global spectral context and the long-distance cross-band dependencies. However, it lacks the ability to modeling signal stationarity and attending the local spectral pattern. The sub-band model is just the opposite. In our proposed FullSubNet, we connect a pure full-band model and a pure sub-band model sequentially and use practical joint training to integrate these two types of models' advantages. We conducted experiments on the DNS challenge (INTERSPEECH 2020) dataset to evaluate the proposed method. Experimental results show that full-band and sub-band information are complementary, and the FullSubNet can effectively integrate them. Besides, the performance of the FullSubNet also exceeds that of the top-ranked methods in the DNS Challenge (INTERSPEECH 2020).

下载PDF全文

下载文献需遵守相关版权规定

论文标题