论文标题

噪声刺激性语音识别,10分钟无与伦比的内域数据

Noise-robust Speech Recognition with 10 Minutes Unparalleled In-domain Data

论文作者

Chen, Chen, Hou, Nana, Hu, Yuchen, Shirol, Shashank, Chng, Eng Siong

论文摘要

噪音般的语音识别系统需要大量的培训数据,包括嘈杂的语音数据和相应的成绩单,以面对各种实际环境,以实现最先进的表演。但是,在现实生活中,这种大量的内域数据并不总是可用。在本文中,我们提出了一个生成对抗网络,以模拟清洁频谱(Simu-Gan)的嘈杂频谱,其中仅需要10分钟的无与伦比的内域嘈杂语音数据作为标签。此外,我们还提出了一个双路径语音识别系统,以改善在嘈杂条件下系统的鲁棒性。实验结果表明,就单词错误率(WER)而言,Simu-Gan通过Simu-Gan对最佳基线的模拟嘈杂数据实现了7.3%的绝对改进。

Noise-robust speech recognition systems require large amounts of training data including noisy speech data and corresponding transcripts to achieve state-of-the-art performances in face of various practical environments. However, such plenty of in-domain data is not always available in the real-life world. In this paper, we propose a generative adversarial network to simulate noisy spectrum from the clean spectrum (Simu-GAN), where only 10 minutes of unparalleled in-domain noisy speech data is required as labels. Furthermore, we also propose a dual-path speech recognition system to improve the robustness of the system under noisy conditions. Experimental results show that the proposed speech recognition system achieves 7.3% absolute improvement with simulated noisy data by Simu-GAN over the best baseline in terms of word error rate (WER).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源