GAN压缩：交互式有条件gan的有效体系结构

论文标题

GAN压缩：交互式有条件gan的有效体系结构

GAN Compression: Efficient Architectures for Interactive Conditional GANs

论文作者

Li, Muyang, Lin, Ji, Ding, Yaoyao, Liu, Zhijian, Zhu, Jun-Yan, Han, Song

论文摘要

有条件的生成对抗网络（CGAN）已为许多视觉和图形应用程序启用了可控的图像综合。但是，最近的CGAN是比现代认识CNN的计算密集度更高的1-2个数量级。例如，仪表每个图像消耗281克Mac，而Mobilenet-V3则消耗了0.44g Mac，因此很难进行交互式部署。在这项工作中，我们提出了一个通用压缩框架，用于减少CGAN中发电机的推理时间和模型大小。直接应用现有的压缩方法会由于GAN训练的难度和发电机体系结构的差异而产生较差的性能。我们通过两种方式解决这些挑战。首先，为了稳定GAN培训，我们将原始模型的多个中间表示的知识转移到其压缩模型中，并将未配对和配对的学习统一。其次，我们的方法没有重复现有的CNN设计，而是通过神经体系结构搜索找到有效的体系结构。为了加速搜索过程，我们将模型培训与重量共享搜索。实验证明了我们方法在不同的监督设置，网络体系结构和学习方法中的有效性。在不失去图像质量的情况下，我们将Cyclegan的计算减少21倍，Pix2Pix由12X，Munit缩小为29倍，而9X的计算量为29倍，为交互式图像合成铺平了道路。

Conditional Generative Adversarial Networks (cGANs) have enabled controllable image synthesis for many vision and graphics applications. However, recent cGANs are 1-2 orders of magnitude more compute-intensive than modern recognition CNNs. For example, GauGAN consumes 281G MACs per image, compared to 0.44G MACs for MobileNet-v3, making it difficult for interactive deployment. In this work, we propose a general-purpose compression framework for reducing the inference time and model size of the generator in cGANs. Directly applying existing compression methods yields poor performance due to the difficulty of GAN training and the differences in generator architectures. We address these challenges in two ways. First, to stabilize GAN training, we transfer knowledge of multiple intermediate representations of the original model to its compressed model and unify unpaired and paired learning. Second, instead of reusing existing CNN designs, our method finds efficient architectures via neural architecture search. To accelerate the search process, we decouple the model training and search via weight sharing. Experiments demonstrate the effectiveness of our method across different supervision settings, network architectures, and learning methods. Without losing image quality, we reduce the computation of CycleGAN by 21x, Pix2pix by 12x, MUNIT by 29x, and GauGAN by 9x, paving the way for interactive image synthesis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题