未知数量来源的迭代声源本地化

论文标题

未知数量来源的迭代声源本地化

Iterative Sound Source Localization for Unknown Number of Sources

论文作者

Fu, Yanjie, Ge, Meng, Yin, Haoran, Qian, Xinyuan, Wang, Longbiao, Zhang, Gaoyan, Dang, Jianwu

论文摘要

声源本地化旨在从观察到的多通道音频中寻求所有声源的到达方向（DOA）。对于未知数量来源的实际问题，现有的本地化算法试图预测基于似然的编码（即空间频谱），并采用预定的阈值来检测源编号和相应的DOA值。但是，这些基于阈值的算法不稳定，因为它们受到仔细选择阈值的限制。为了解决此问题，我们提出了一种称为ISSL的迭代声源本地化方法，该方法可以迭代地提取每个源的DOA而无需阈值，直到满足终止标准为止。与基于阈值的算法不同，ISSL设计基于二进制分类器的主动源检测器网络，以接受残留的空间频谱并决定是否停止迭代。通过这样做，我们的ISSL可以处理任意数量的来源，甚至超过培训阶段中看到的来源数量。实验结果表明，与现有的基于阈值的算法相比，我们的ISSL在DOA估计和源数检测方面都取得了重大的性能提高。

Sound source localization aims to seek the direction of arrival (DOA) of all sound sources from the observed multi-channel audio. For the practical problem of unknown number of sources, existing localization algorithms attempt to predict a likelihood-based coding (i.e., spatial spectrum) and employ a pre-determined threshold to detect the source number and corresponding DOA value. However, these threshold-based algorithms are not stable since they are limited by the careful choice of threshold. To address this problem, we propose an iterative sound source localization approach called ISSL, which can iteratively extract each source's DOA without threshold until the termination criterion is met. Unlike threshold-based algorithms, ISSL designs an active source detector network based on binary classifier to accept residual spatial spectrum and decide whether to stop the iteration. By doing so, our ISSL can deal with an arbitrary number of sources, even more than the number of sources seen during the training stage. The experimental results show that our ISSL achieves significant performance improvements in both DOA estimation and source number detection compared with the existing threshold-based algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题