具有生成对抗网络和循环正则化的无条件音频生成

论文标题

具有生成对抗网络和循环正则化的无条件音频生成

Unconditional Audio Generation with Generative Adversarial Networks and Cycle Regularization

论文作者

Liu, Jen-Yu, Chen, Yu-Hua, Yeh, Yin-Cheng, Yang, Yi-Hsuan

论文摘要

在最近的一篇论文中，我们提出了一种基于生成的对抗网络（GAN）的模型，用于无条件地生成歌声声音的旋光图。由于该模型的发电机被设计为将噪声向量的可变长度序列作为输入，因此它可以生成可变长度的MEL光谱图。但是，我们以前的听力测试表明，生成的音频的质量为改进的空间留出了空间。本文在以下方面扩展并扩展了以前的工作。首先，我们在发电机中采用层次结构来诱导时间维度的某些结构。其次，我们向发电机引入一个周期正规化机制，以避免模式崩溃。第三，我们评估了新模型的性能，不仅用于产生歌声，而且还用于产生语音声音。评估结果表明，新模型在客观和主观上均优于前面的模型。我们还采用该模型无条件生成钢琴和小提琴音乐的序列，并找到了有希望的结果。音频示例以及实施我们模型的代码，将在纸质出版物上在线公开提供。

In a recent paper, we have presented a generative adversarial network (GAN)-based model for unconditional generation of the mel-spectrograms of singing voices. As the generator of the model is designed to take a variable-length sequence of noise vectors as input, it can generate mel-spectrograms of variable length. However, our previous listening test shows that the quality of the generated audio leaves room for improvement. The present paper extends and expands that previous work in the following aspects. First, we employ a hierarchical architecture in the generator to induce some structure in the temporal dimension. Second, we introduce a cycle regularization mechanism to the generator to avoid mode collapse. Third, we evaluate the performance of the new model not only for generating singing voices, but also for generating speech voices. Evaluation result shows that new model outperforms the prior one both objectively and subjectively. We also employ the model to unconditionally generate sequences of piano and violin music and find the result promising. Audio examples, as well as the code for implementing our model, will be publicly available online upon paper publication.

下载PDF全文

下载文献需遵守相关版权规定

论文标题