UVCGAN：UNET视觉变压器周期符合的GAN，用于未配对的图像到图像翻译

论文标题

UVCGAN：UNET视觉变压器周期符合的GAN，用于未配对的图像到图像翻译

UVCGAN: UNet Vision Transformer cycle-consistent GAN for unpaired image-to-image translation

论文作者

Torbunov, Dmitrii, Huang, Yi, Yu, Haiwang, Huang, Jin, Yoo, Shinjae, Lin, Meifeng, Viren, Brett, Ren, Yihui

论文摘要

未配对的图像到图像翻译在艺术，设计和科学模拟中具有广泛的应用。一个早期的突破是Cyclegan，它强调了两个未配对的图像域之间通过生成对流网络（GAN）以及周期矛盾的约束，而最近的作品则促进了一对一的映射以增强翻译图像的多样性。这项工作是由科学模拟和一对一需求的动机，重新审视了经典的自行车框架，并提高了其性能，以超越更多的现代模型，而不会放松周期性的限制。为了实现这一目标，我们为发电机配备了视觉变压器（VIT），并采用必要的培训和正则化技术。与以前的表现最佳模型相比，我们的模型性能更好，并保持原始图像和翻译图像之间的相关性很强。随附的消融研究表明，梯度罚款和自我监管的预训练对于改善至关重要。为了促进可重复性和开放科学，可以在https://github.com/ls4gan/uvcgan上获得源代码，超参数配置和预培训模型。

Unpaired image-to-image translation has broad applications in art, design, and scientific simulations. One early breakthrough was CycleGAN that emphasizes one-to-one mappings between two unpaired image domains via generative-adversarial networks (GAN) coupled with the cycle-consistency constraint, while more recent works promote one-to-many mapping to boost diversity of the translated images. Motivated by scientific simulation and one-to-one needs, this work revisits the classic CycleGAN framework and boosts its performance to outperform more contemporary models without relaxing the cycle-consistency constraint. To achieve this, we equip the generator with a Vision Transformer (ViT) and employ necessary training and regularization techniques. Compared to previous best-performing models, our model performs better and retains a strong correlation between the original and translated image. An accompanying ablation study shows that both the gradient penalty and self-supervised pre-training are crucial to the improvement. To promote reproducibility and open science, the source code, hyperparameter configurations, and pre-trained model are available at https://github.com/LS4GAN/uvcgan.

下载PDF全文

下载文献需遵守相关版权规定

论文标题