基于语音增强的无监督学习，用于关键字发现

论文标题

基于语音增强的无监督学习，用于关键字发现

Speech Augmentation Based Unsupervised Learning for Keyword Spotting

论文作者

Luo, Jian, Wang, Jianzong, Cheng, Ning, Tang, Haobin, Xiao, Jing

论文摘要

在本文中，我们调查了一种基于语音增强的无监督学习方法（KWS）任务。 KWS是一个有用的语音应用程序，但也很大程度上取决于标记的数据。我们设计了CNN的注意结构来执行KWS任务。 CNN层集中在局部声学特征上，而注意层则建模了长期依赖性。为了提高KWS模型的鲁棒性，我们还提出了一种无监督的学习方法。无监督的损失基于原始语音和增强语音特征以及音频重建信息之间的相似性。在无监督的学习中探索了两种语音增强方法：速度和强度。 Google语音命令V2数据集的实验表明，我们的CNN注意模型具有竞争性结果。此外，基于增强的无监督学习可以进一步提高KWS任务的分类准确性。在我们的实验中，通过基于增强的无监督学习，我们的KWS模型比其他无监督方法（例如CPC，APC和MPC）实现了更好的性能。

In this paper, we investigated a speech augmentation based unsupervised learning approach for keyword spotting (KWS) task. KWS is a useful speech application, yet also heavily depends on the labeled data. We designed a CNN-Attention architecture to conduct the KWS task. CNN layers focus on the local acoustic features, and attention layers model the long-time dependency. To improve the robustness of KWS model, we also proposed an unsupervised learning method. The unsupervised loss is based on the similarity between the original and augmented speech features, as well as the audio reconstructing information. Two speech augmentation methods are explored in the unsupervised learning: speed and intensity. The experiments on Google Speech Commands V2 Dataset demonstrated that our CNN-Attention model has competitive results. Moreover, the augmentation based unsupervised learning could further improve the classification accuracy of KWS task. In our experiments, with augmentation based unsupervised learning, our KWS model achieves better performance than other unsupervised methods, such as CPC, APC, and MPC.

下载PDF全文

下载文献需遵守相关版权规定

论文标题