跨与年龄扬声器验证：学习年龄不变的扬声器嵌入

论文标题

跨与年龄扬声器验证：学习年龄不变的扬声器嵌入

Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings

论文作者

Qin, Xiaoyi, Li, Na, Weng, Chao, Su, Dan, Li, Ming

论文摘要

近年来，自动扬声器验证取得了显着的进展。但是，由于相关数据不足，几乎没有关于跨年龄扬声器验证（CASV）的研究。在本文中，我们根据Voxceleb数据集挖掘了跨时代测试集，并提出了我们的年龄不变扬声器代表（AISR）学习方法。由于VoxCeleb是从YouTube平台收集的，因此数据集由固有的跨年龄数据组成。但是，元数据不包含扬声器年龄标签。因此，我们采用面部年龄估计方法来预测相关视觉数据的说话者年龄值，然后将音频记录标记为估计年龄。我们在Voxceleb（Vox-CA）上构建了多个跨年龄测试集，该测试集（Vox-CA）故意选择具有较大年龄差异的阳性试验。同样，在选择负对与VOX-H病例保持一致时，还考虑了国籍和性别的效果。在VOX-H测试集上的1.939 \％EER下降到Vox-CA20测试集的10.419 \％，这表明跨年龄方案有多困难。因此，我们提出了一种年龄偶联的对抗学习（ADAL）方法，以减轻年龄差距的负面影响并减少阶层内差异。在Vox-CA20测试集上，我们的方法的表现优于基线系统超过10 \％相关的EER。源代码和试用资源可在https://github.com/qinxiaoyi/cross-age_speaker_verification上获得

Automatic speaker verification has achieved remarkable progress in recent years. However, there is little research on cross-age speaker verification (CASV) due to insufficient relevant data. In this paper, we mine cross-age test sets based on the VoxCeleb dataset and propose our age-invariant speaker representation(AISR) learning method. Since the VoxCeleb is collected from the YouTube platform, the dataset consists of cross-age data inherently. However, the meta-data does not contain the speaker age label. Therefore, we adopt the face age estimation method to predict the speaker age value from the associated visual data, then label the audio recording with the estimated age. We construct multiple Cross-Age test sets on VoxCeleb (Vox-CA), which deliberately select the positive trials with large age-gap. Also, the effect of nationality and gender is considered in selecting negative pairs to align with Vox-H cases. The baseline system performance drops from 1.939\% EER on the Vox-H test set to 10.419\% on the Vox-CA20 test set, which indicates how difficult the cross-age scenario is. Consequently, we propose an age-decoupling adversarial learning (ADAL) method to alleviate the negative effect of the age gap and reduce intra-class variance. Our method outperforms the baseline system by over 10\% related EER reduction on the Vox-CA20 test set. The source code and trial resources are available on https://github.com/qinxiaoyi/Cross-Age_Speaker_Verification

下载PDF全文

下载文献需遵守相关版权规定

论文标题