论文标题
嵌入式系统的小脚印独立的扬声器验证
Small footprint Text-Independent Speaker Verification for Embedded Systems
论文作者
论文摘要
扬声器验证的深度神经网络方法已被证明是成功的,但是最先进的(SOTA)系统的典型计算要求使它们不适合嵌入式应用程序。在这项工作中,我们提出了一个小于公共解决方案(237.5k学习参数,11.5mflops)的两阶段模型架构阶数,在良好的voxceleb1验证测试集上达到了3.31%相等错误率(EER)的竞争结果。我们证明了在物联网系统的典型的小型设备上运行解决方案的可能性,例如覆盆子Pi 3b,其延迟小于200ms,在5s长的话语上。此外,我们评估了我们的模型在声音上具有挑战性的声音语料库。我们报告说,从距离挑战中,2019年声音的最佳评分模型的EER有限增加,而学习参数的数量减少了25.6倍。
Deep neural network approaches to speaker verification have proven successful, but typical computational requirements of State-Of-The-Art (SOTA) systems make them unsuited for embedded applications. In this work, we present a two-stage model architecture orders of magnitude smaller than common solutions (237.5K learning parameters, 11.5MFLOPS) reaching a competitive result of 3.31% Equal Error Rate (EER) on the well established VoxCeleb1 verification test set. We demonstrate the possibility of running our solution on small devices typical of IoT systems such as the Raspberry Pi 3B with a latency smaller than 200ms on a 5s long utterance. Additionally, we evaluate our model on the acoustically challenging VOiCES corpus. We report a limited increase in EER of 2.6 percentage points with respect to the best scoring model of the 2019 VOiCES from a Distance Challenge, against a reduction of 25.6 times in the number of learning parameters.