双路径自我发作RNN用于实时演讲增强

论文标题

双路径自我发作RNN用于实时演讲增强

Dual-path Self-Attention RNN for Real-Time Speech Enhancement

论文作者

Pandey, Ashutosh, Wang, DeLiang

论文摘要

我们提出了一个双路自发复发性神经网络（DP-SARNN），以增强时间域的语音。我们通过通过最近提出的有效的注意机制来增强与界面和厨房内的RNN来改善双路线RNN（DP-RNN）。锁骨间和锁骨内的注意力的组合改善了长序列语音框架的注意机制。 DP-SARNN的表现优于基线DP-RNN，它使用框架移动率是DP-RNN的四倍，这会导致每个话语的计算时间大大减少。结果，我们通过使用长期短期记忆（LSTM）RNN和chunk Inter-chunk Sarnn的因果注意来开发实时的DP-SARNN。 DP-SARNN显着优于现有的语音增强方法，平均需要7.9 ms CPU的时间来处理32毫秒的信号块。

We propose a dual-path self-attention recurrent neural network (DP-SARNN) for time-domain speech enhancement. We improve dual-path RNN (DP-RNN) by augmenting inter-chunk and intra-chunk RNN with a recently proposed efficient attention mechanism. The combination of inter-chunk and intra-chunk attention improves the attention mechanism for long sequences of speech frames. DP-SARNN outperforms a baseline DP-RNN by using a frame shift four times larger than in DP-RNN, which leads to a substantially reduced computation time per utterance. As a result, we develop a real-time DP-SARNN by using long short-term memory (LSTM) RNN and causal attention in inter-chunk SARNN. DP-SARNN significantly outperforms existing approaches to speech enhancement, and on average takes 7.9 ms CPU time to process a signal chunk of 32 ms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题