论文标题
语音转换挑战的IQIYI系统2020
The IQIYI System for Voice Conversion Challenge 2020
论文作者
论文摘要
本文介绍了2020年语音转换的IQIYI语音转换系统(T24)。在比赛中,每个目标扬声器都有70个句子。我们已经建立了基于PPG的端到端语音转换系统。首先,ASR声学模型计算BN特征,该功能代表语音中与内容相关的信息。然后通过改进的韵律塔科克斯模型计算MEL特征。最后,MEL光谱通过改进的LPCNET转化为WAV。评估结果表明,该系统可以实现更好的语音转换效果。在使用16K而不是24K采样率音频的情况下,转换结果的自然性和相似性相对较好。其中,我们最好的结果是对任务2的相似性评估,第2个基于ASV的客观评估和主观评估中的第5位。
This paper presents the IQIYI voice conversion system (T24) for Voice Conversion 2020. In the competition, each target speaker has 70 sentences. We have built an end-to-end voice conversion system based on PPG. First, the ASR acoustic model calculates the BN feature, which represents the content-related information in the speech. Then the Mel feature is calculated through an improved prosody tacotron model. Finally, the Mel spectrum is converted to wav through an improved LPCNet. The evaluation results show that this system can achieve better voice conversion effects. In the case of using 16k rather than 24k sampling rate audio, the conversion result is relatively good in naturalness and similarity. Among them, our best results are in the similarity evaluation of the Task 2, the 2nd in the ASV-based objective evaluation and the 5th in the subjective evaluation.