使用知情的复合物值空间自动编码器来利用空间信息，以提取目标扬声器

论文标题

使用知情的复合物值空间自动编码器来利用空间信息，以提取目标扬声器

Exploiting spatial information with the informed complex-valued spatial autoencoder for target speaker extraction

论文作者

Briegleb, Annika, Halimeh, Mhd Modar, Kellermann, Walter

论文摘要

在常规的多通道音频信号增强中，通常执行空间和光谱滤波。相反，已经表明，对于神经空间滤波，光谱空间滤波的关节方法更有益。在这一贡献中，我们研究了通过这种时间变化的光谱空间滤波器执行的空间滤波。我们通过利用其可解释的结构并有目的地将目标扬声器的位置告知网络，从而扩展了最近提出的复杂价值空间自动编码器（COSPA）来提取目标扬声器的任务。我们表明，由此产生的知情COSPA（ICOSPA）有效，灵活地从扬声器的混合物中提取目标扬声器。我们还发现，所提出的体系结构能够很好地学习明显的空间选择性模式，并表明在计算各种评估指标时，结果显着取决于训练目标和参考信号。

In conventional multichannel audio signal enhancement, spatial and spectral filtering are often performed sequentially. In contrast, it has been shown that for neural spatial filtering a joint approach of spectro-spatial filtering is more beneficial. In this contribution, we investigate the spatial filtering performed by such a time-varying spectro-spatial filter. We extend the recently proposed complex-valued spatial autoencoder (COSPA) for the task of target speaker extraction by leveraging its interpretable structure and purposefully informing the network of the target speaker's position. We show that the resulting informed COSPA (iCOSPA) effectively and flexibly extracts a target speaker from a mixture of speakers. We also find that the proposed architecture is well capable of learning pronounced spatial selectivity patterns and show that the results depend significantly on the training target and the reference signal when computing various evaluation metrics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题