Skinaugment：自动语音翻译的自动编码扬声器转换

论文标题

Skinaugment：自动语音翻译的自动编码扬声器转换

SkinAugment: Auto-Encoding Speaker Conversions for Automatic Speech Translation

论文作者

McCarthy, Arya D., Puzon, Liezl, Pino, Juan

论文摘要

我们提出了自动编码扬声器的转换，以在自动语音翻译中培训数据增强。该技术直接转换了音频序列，从而导致音频合成，类似于另一个说话者的声音。我们的方法比较与英语$ \ to $ french和英语$ \ to $ romanian自动语音翻译（AST）任务以及低资源英语自动语音识别（ASR）任务相比。此外，在消融中，我们显示了增强数据中数量和多样性的好处。最后，我们表明我们可以将方法与通过机器翻译成绩单进行扩展，以获得具有竞争性的端到端AST模型，该模型胜过英语$ \ $ \ $ french AST任务的非常强大的级联模型。我们的方法足够通用，可以应用于其他语音生成和分析任务。

We propose autoencoding speaker conversion for training data augmentation in automatic speech translation. This technique directly transforms an audio sequence, resulting in audio synthesized to resemble another speaker's voice. Our method compares favorably to SpecAugment on English$\to$French and English$\to$Romanian automatic speech translation (AST) tasks as well as on a low-resource English automatic speech recognition (ASR) task. Further, in ablations, we show the benefits of both quantity and diversity in augmented data. Finally, we show that we can combine our approach with augmentation by machine-translated transcripts to obtain a competitive end-to-end AST model that outperforms a very strong cascade model on an English$\to$French AST task. Our method is sufficiently general that it can be applied to other speech generation and analysis tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题