论文标题
不太壮大:用基于小波的超分辨率在小计算上产生高保真图像
not-so-BigGAN: Generating High-Fidelity Images on Small Compute with Wavelet-based Super-Resolution
论文作者
论文摘要
高分辨率图像生成的最先进模型,例如Biggan和VQVAE-2,需要大量的计算资源和/或时间(512 TPU-V3核心)来训练,从而使它们无法达到更大的研究社区。另一方面,基于GAN的图像超分辨率模型(例如Esrgan)不仅可以将图像升至高维度,而且可以有效地训练。在本文中,我们提出了不是很小的gan(NSB-GAN),这是一个简单但具有成本效益的两步训练框架,用于高维自然图像的深层生成模型(DGM)。首先,我们通过在小波域中训练采样器来生成低频频段中的图像。然后,我们通过我们的新型小波超分辨率解码器网络将这些图像从小波域重返像素空间。基于小波的下采样方法比基于像素的方法保留了更多的结构信息,从而导致低分辨率采样器的生成质量明显更好(例如64x64)。由于采样器和解码器可以并行训练,并在尺寸较低的空间上运行,因此训练成本大大降低。在Imagenet 512x512上,我们的模型达到了10.59的Fréchet成立距离(FID) - 击败基线Biggan模型 - 在一半的计算(256 TPU-V3核心)处。
State-of-the-art models for high-resolution image generation, such as BigGAN and VQVAE-2, require an incredible amount of compute resources and/or time (512 TPU-v3 cores) to train, putting them out of reach for the larger research community. On the other hand, GAN-based image super-resolution models, such as ESRGAN, can not only upscale images to high dimensions, but also are efficient to train. In this paper, we present not-so-big-GAN (nsb-GAN), a simple yet cost-effective two-step training framework for deep generative models (DGMs) of high-dimensional natural images. First, we generate images in low-frequency bands by training a sampler in the wavelet domain. Then, we super-resolve these images from the wavelet domain back to the pixel-space with our novel wavelet super-resolution decoder network. Wavelet-based down-sampling method preserves more structural information than pixel-based methods, leading to significantly better generative quality of the low-resolution sampler (e.g., 64x64). Since the sampler and decoder can be trained in parallel and operate on much lower dimensional spaces than end-to-end models, the training cost is substantially reduced. On ImageNet 512x512, our model achieves a Fréchet Inception Distance (FID) of 10.59 -- beating the baseline BigGAN model -- at half the compute (256 TPU-v3 cores).