Melglow：基于位置变化卷积的有效波形生成网络

论文标题

Melglow：基于位置变化卷积的有效波形生成网络

MelGlow: Efficient Waveform Generative Network Based on Location-Variable Convolution

论文作者

Zeng, Zhen, Wang, Jianzong, Cheng, Ning, Xiao, Jing

论文摘要

近期的神经声码编码器通常使用类似波诺特的网络来捕获波形的长期依赖性，但是需要大量参数才能获得良好的建模功能。在本文中，提出了一个有效的网络，称为位置变量卷积，以模拟波形的依赖性。与在WaveNet中使用统一的卷积内核来捕获任意波形的依赖性不同，位置变化的卷积利用内核预测器来生成基于MEL-SPECTRUM的多组卷积内核，其中每套卷积核心都用于在相关的波形间隔上执行卷积操作。设计了波格音乐和位置变化的卷积，有效的声码器（名为Melglow）的设计。 LJSpeech数据集的实验表明，Melglow在小型型号下的性能要比Wavellow更好，这验证了位置变量卷积的有效性和潜在优化空间。

Recent neural vocoders usually use a WaveNet-like network to capture the long-term dependencies of the waveform, but a large number of parameters are required to obtain good modeling capabilities. In this paper, an efficient network, named location-variable convolution, is proposed to model the dependencies of waveforms. Different from the use of unified convolution kernels in WaveNet to capture the dependencies of arbitrary waveforms, location-variable convolutions utilizes a kernel predictor to generate multiple sets of convolution kernels based on the mel-spectrum, where each set of convolution kernels is used to perform convolution operations on the associated waveform intervals. Combining WaveGlow and location-variable convolutions, an efficient vocoder, named MelGlow, is designed. Experiments on the LJSpeech dataset show that MelGlow achieves better performance than WaveGlow at small model sizes, which verifies the effectiveness and potential optimization space of location-variable convolutions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题