在线指导源分离

论文标题

在线指导源分离

Block-Online Guided Source Separation

论文作者

Horiguchi, Shota, Fujita, Yusuke, Nagamatsu, Kenji

论文摘要

我们提出了一种指导源分离（GSS）的块轴线算法。 GSS是一种语音分离方法，它使用诊断信息来更新观测信号的生成模型的参数。先前的研究表明，GSS在多对话的场景中表现良好。但是，它需要大量的计算时间，这是部署在线应用程序的障碍。离线GSS是一种算法的算法，因此它会根据话语的长度产生延迟，这也是一个问题。使用所提出的算法，将块的输入样本和相应的时间注释与前面上下文中的那些串联，并用于更新参数。使用上下文使算法能够仅从一个对每个块的优化迭代中准确地估算时间频面掩码，并且其延迟不取决于话语长度，而是预定的块长度。它还通过仅更新每个块中的活动扬声器的参数及其上下文来降低计算成本。对Chime-6语料库和会议语料库的评估表明，所提出的算法与常规的离线GSS算法的性能几乎相同，但计算得更快，这足以实现实时应用。

We propose a block-online algorithm of guided source separation (GSS). GSS is a speech separation method that uses diarization information to update parameters of the generative model of observation signals. Previous studies have shown that GSS performs well in multi-talker scenarios. However, it requires a large amount of calculation time, which is an obstacle to the deployment of online applications. It is also a problem that the offline GSS is an utterance-wise algorithm so that it produces latency according to the length of the utterance. With the proposed algorithm, block-wise input samples and corresponding time annotations are concatenated with those in the preceding context and used to update the parameters. Using the context enables the algorithm to estimate time-frequency masks accurately only from one iteration of optimization for each block, and its latency does not depend on the utterance length but predetermined block length. It also reduces calculation cost by updating only the parameters of active speakers in each block and its context. Evaluation on the CHiME-6 corpus and a meeting corpus showed that the proposed algorithm achieved almost the same performance as the conventional offline GSS algorithm but with 32x faster calculation, which is sufficient for real-time applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题