StyleWavegan：使用生成对抗网络具有广泛控件的鼓声的基于样式的鼓声

论文标题

StyleWavegan：使用生成对抗网络具有广泛控件的鼓声的基于样式的鼓声

StyleWaveGAN: Style-based synthesis of drum sounds with extensive controls using generative adversarial networks

论文作者

Lavault, Antoine, Roebel, Axel, Voiry, Matthieu

论文摘要

在本文中，我们介绍了StyleWavegan，这是一种基于样式的鼓声生成器，它是一种最先进的图像生成器Stylegan的变体。通过对DRUM类型和几个音频描述符的类型进行调节，我们能够在CD质量的GPU上比实时更快地合成波形，直至1.5秒，同时保留了对代发电的大量控制。我们还介绍了gan逐渐生长的替代方法，并尝试了数据集平衡对生成任务的影响。这些实验是在由不同的鼓和c组成的公开数据集的增强子集上进行的。我们评估了两个最近的鼓发电机，即Wavegan和Neurodrum，证明了具有显着提高的发电质量（以Frechet Audio距离测量），并具有感知特征的有趣结果。

In this paper we introduce StyleWaveGAN, a style-based drum sound generator that is a variation of StyleGAN, a state-of-the-art image generator. By conditioning StyleWaveGAN on both the type of drum and several audio descriptors, we are able to synthesize waveforms faster than real-time on a GPU directly in CD quality up to a duration of 1.5s while retaining a considerable amount of control over the generation. We also introduce an alternative to the progressive growing of GANs and experimented on the effect of dataset balancing for generative tasks. The experiments are carried out on an augmented subset of a publicly available dataset comprised of different drums and cymbals. We evaluate against two recent drum generators, WaveGAN and NeuroDrum, demonstrating significantly improved generation quality (measured with the Frechet Audio Distance) and interesting results with perceptual features.

下载PDF全文

下载文献需遵守相关版权规定

论文标题