从SpokenVobab生成综合语音以进行语音翻译

论文标题

从SpokenVobab生成综合语音以进行语音翻译

Generating Synthetic Speech from SpokenVocab for Speech Translation

论文作者

Zhao, Jinming, Haffar, Gholamreza, Shareghi, Ehsan

论文摘要

培训端到端语音翻译（ST）系统需要足够的大规模数据，这对于大多数语言对和域而言不可用。数据稀缺问题的一种实用解决方案是通过文本到语音（TTS）系统将机器翻译数据（MT）转换为ST数据。但是，使用TTS系统可能是乏味和缓慢的，因为每个MT数据集都需要进行转换。在这项工作中，我们提出了一种简单，可扩展和有效的数据增强技术，即SpokenVocab，以将MT数据转换为ST数据。这个想法是根据MT序列中的单词从口号库中检索和缝制音频片段。我们在Mast-C的多种语言对上进行的实验表明，此方法的表现平均比强基础的BLEU得分平均优于强大的基线，并且其性能与TTS生成的语音同样出色。我们还展示了如何在代码转换ST中应用SCHOKEVOCAB，通常没有TTS系统退出。我们的代码可在https://github.com/mingzi151/spokenvocab上找到

Training end-to-end speech translation (ST) systems requires sufficiently large-scale data, which is unavailable for most language pairs and domains. One practical solution to the data scarcity issue is to convert machine translation data (MT) to ST data via text-to-speech (TTS) systems. Yet, using TTS systems can be tedious and slow, as the conversion needs to be done for each MT dataset. In this work, we propose a simple, scalable and effective data augmentation technique, i.e., SpokenVocab, to convert MT data to ST data on-the-fly. The idea is to retrieve and stitch audio snippets from a SpokenVocab bank according to words in an MT sequence. Our experiments on multiple language pairs from Must-C show that this method outperforms strong baselines by an average of 1.83 BLEU scores, and it performs equally well as TTS-generated speech. We also showcase how SpokenVocab can be applied in code-switching ST for which often no TTS systems exit. Our code is available at https://github.com/mingzi151/SpokenVocab

下载PDF全文

下载文献需遵守相关版权规定

论文标题