论文标题
Voice Privacy 2022系统描述:具有功能匹配的F0轨迹的扬声器匿名化
VoicePrivacy 2022 System Description: Speaker Anonymization with Feature-matched F0 Trajectories
论文作者
论文摘要
我们介绍了一种新的方法,以提高语音挑战2022基线B1变体的性能。在基于X-Vector的匿名系统的已知缺陷中,输入功能的分解不足。特别是,基本频率(F0)轨迹,用于语音综合而没有任何修改。尤其是在跨性别转换中,这种情况会导致不自然的声音,增加单词错误率(WERS)和个人信息泄漏。我们的提交通过综合F0轨迹来克服这个问题,该轨迹更好地与匿名X-vector协调一致。我们利用瓶颈特征(BN)和匿名X-vector的语言内容使用低复杂性深神经网络来估计每个框架的适当F0值。我们的方法可显着改善匿名系统,并增加合成声音的自然性。因此,我们的结果表明语音匿名不需要F0提取。
We introduce a novel method to improve the performance of the VoicePrivacy Challenge 2022 baseline B1 variants. Among the known deficiencies of x-vector-based anonymization systems is the insufficient disentangling of the input features. In particular, the fundamental frequency (F0) trajectories, which are used for voice synthesis without any modifications. Especially in cross-gender conversion, this situation causes unnatural sounding voices, increases word error rates (WERs), and personal information leakage. Our submission overcomes this problem by synthesizing an F0 trajectory, which better harmonizes with the anonymized x-vector. We utilized a low-complexity deep neural network to estimate an appropriate F0 value per frame, using the linguistic content from the bottleneck features (BN) and the anonymized x-vector. Our approach results in a significantly improved anonymization system and increased naturalness of the synthesized voice. Consequently, our results suggest that F0 extraction is not required for voice anonymization.