论文标题
使用timbre timerving Pitch Egmentation在FastPitch中增强音高可控性
Enhancement of Pitch Controllability using Timbre-Preserving Pitch Augmentation in FastPitch
论文作者
论文摘要
最近开发的可控制的文本对语音(TTS)模型,即FastPitch,是针对音高轮廓的条件。然而,综合语音的质量对于与平均音高显着偏离的音高值大大降低了。即控制音高的能力受到限制。为了解决这个问题,我们提出了两种算法,以提高FastPitch的鲁棒性。首先,我们提出了一种新颖的弹性音调转换算法,用于自然音高增强。使用拟议的算法时,倾斜的语音样本听起来更自然,因为演讲者的声音音色得到了维护。此外,我们提出了一种训练算法,该培训算法使用具有不同音调范围的同一句子的音调语音数据集定义了FastPitch。实验结果表明,所提出的算法可改善FastPitch的音高可控性。
The recently developed pitch-controllable text-to-speech (TTS) model, i.e. FastPitch, was conditioned for the pitch contours. However, the quality of the synthesized speech degraded considerably for pitch values that deviated significantly from the average pitch; i.e. the ability to control pitch was limited. To address this issue, we propose two algorithms to improve the robustness of FastPitch. First, we propose a novel timbre-preserving pitch-shifting algorithm for natural pitch augmentation. Pitch-shifted speech samples sound more natural when using the proposed algorithm because the speaker's vocal timbre is maintained. Moreover, we propose a training algorithm that defines FastPitch using pitch-augmented speech datasets with different pitch ranges for the same sentence. The experimental results demonstrate that the proposed algorithms improve the pitch controllability of FastPitch.