MIMO-DBNET：多通道输入和多个输出DOA-Awa-Awawawawawawawawane to-beam形成网络，用于语音分离

论文标题

MIMO-DBNET：多通道输入和多个输出DOA-Awa-Awawawawawawawawane to-beam形成网络，用于语音分离

MIMO-DBnet: Multi-channel Input and Multiple Outputs DOA-aware Beamforming Network for Speech Separation

论文作者

Fu, Yanjie, Yin, Haoran, Ge, Meng, Wang, Longbiao, Zhang, Gaoyan, Dang, Jianwu, Deng, Chengyun, Wang, Fei

论文摘要

最近，已经提出了许多基于深度学习的波束形式，用于多渠道语音分离。然而，他们中的大多数都依赖提前已知的额外提示，例如扬声器功能，面部图像或方向信息。在本文中，我们提出了一个端到端的横梁成形网络，用于指导语音分离，仅给出了混合信号，即Mimo-Dbnet。具体而言，我们设计了一个多通道输入和多个输出体系结构，以预测每个源的基于到达方向的嵌入和光束形成权重。精确估计的方向嵌入为神经波束形式提供了相当有效的空间歧视指南，以抵消相结合的效果，从而可以更准确地重建两个来源的语音信号。实验表明，与基线系统相比，我们提出的MIMO-DBNET不仅取得了全面的改进，而且在发生相结合时也保持高频带的性能。

Recently, many deep learning based beamformers have been proposed for multi-channel speech separation. Nevertheless, most of them rely on extra cues known in advance, such as speaker feature, face image or directional information. In this paper, we propose an end-to-end beamforming network for direction guided speech separation given merely the mixture signal, namely MIMO-DBnet. Specifically, we design a multi-channel input and multiple outputs architecture to predict the direction-of-arrival based embeddings and beamforming weights for each source. The precisely estimated directional embedding provides quite effective spatial discrimination guidance for the neural beamformer to offset the effect of phase wrapping, thus allowing more accurate reconstruction of two sources' speech signals. Experiments show that our proposed MIMO-DBnet not only achieves a comprehensive decent improvement compared to baseline systems, but also maintain the performance on high frequency bands when phase wrapping occurs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题