噪声刺激性语音识别，10分钟无与伦比的内域数据

论文标题

噪声刺激性语音识别，10分钟无与伦比的内域数据

Noise-robust Speech Recognition with 10 Minutes Unparalleled In-domain Data

论文作者

Chen, Chen, Hou, Nana, Hu, Yuchen, Shirol, Shashank, Chng, Eng Siong

论文摘要

噪音般的语音识别系统需要大量的培训数据，包括嘈杂的语音数据和相应的成绩单，以面对各种实际环境，以实现最先进的表演。但是，在现实生活中，这种大量的内域数据并不总是可用。在本文中，我们提出了一个生成对抗网络，以模拟清洁频谱（Simu-Gan）的嘈杂频谱，其中仅需要10分钟的无与伦比的内域嘈杂语音数据作为标签。此外，我们还提出了一个双路径语音识别系统，以改善在嘈杂条件下系统的鲁棒性。实验结果表明，就单词错误率（WER）而言，Simu-Gan通过Simu-Gan对最佳基线的模拟嘈杂数据实现了7.3％的绝对改进。

Noise-robust speech recognition systems require large amounts of training data including noisy speech data and corresponding transcripts to achieve state-of-the-art performances in face of various practical environments. However, such plenty of in-domain data is not always available in the real-life world. In this paper, we propose a generative adversarial network to simulate noisy spectrum from the clean spectrum (Simu-GAN), where only 10 minutes of unparalleled in-domain noisy speech data is required as labels. Furthermore, we also propose a dual-path speech recognition system to improve the robustness of the system under noisy conditions. Experimental results show that the proposed speech recognition system achieves 7.3% absolute improvement with simulated noisy data by Simu-GAN over the best baseline in terms of word error rate (WER).

下载PDF全文

下载文献需遵守相关版权规定

论文标题