论文标题
L3DAS22挑战的PCG-aiid系统:MIMO和MISO卷积重复网络,用于增强多通道语音和语音识别
The PCG-AIID System for L3DAS22 Challenge: MIMO and MISO convolutional recurrent Network for Multi Channel Speech Enhancement and Speech Recognition
论文作者
论文摘要
本文描述了任务1:3D演讲在办公室混响环境中的L3DAS22挑战的PCG-aiid系统。我们提出了一个两阶段的框架,以解决多通道语音降解和覆盖。在第一阶段,应用多个输入和多重输出(MIMO)网络以删除背景噪声,同时保持多通道信号的空间特性。在第二阶段,应用多个输入和单个输出(MISO)网络,以从所需的方向和过滤后增强语音。结果,我们的系统在ICASSP2022 L3DAS22挑战中排名第三,并且在盲验测试集中获得了3.2%WER和0.972 Stoi的挑战,并明显优于基线系统。
This paper described the PCG-AIID system for L3DAS22 challenge in Task 1: 3D speech enhancement in office reverberant environment. We proposed a two-stage framework to address multi-channel speech denoising and dereverberation. In the first stage, a multiple input and multiple output (MIMO) network is applied to remove background noise while maintaining the spatial characteristics of multi-channel signals. In the second stage, a multiple input and single output (MISO) network is applied to enhance the speech from desired direction and post-filtering. As a result, our system ranked 3rd place in ICASSP2022 L3DAS22 challenge and significantly outperforms the baseline system, while achieving 3.2% WER and 0.972 STOI on the blind test-set.