论文标题
Interfacegan:解释甘恩学到的分离的面部表示
InterFaceGAN: Interpreting the Disentangled Face Representation Learned by GANs
论文作者
论文摘要
尽管生成的对抗网络(GAN)在面部综合方面取得了重大进展,但对甘斯在潜在表示中学到的知识缺乏足够的了解,无法将随机代码映射到照片真实的图像。在这项工作中,我们提出了一个称为Interfacegan的框架,以解释最先进的gan模型所学的脱离面部表示,并研究潜在空间中编码的面部语义的特性。我们首先发现,甘斯在潜在空间的某些线性子空间中学习各种语义。识别这些子空间后,我们可以在不重新训练模型的情况下实际操纵相应的面部属性。然后,我们对不同语义之间的相关性进行了详细的研究,并通过子空间投影更好地将它们解散,从而更加精确地控制了属性操作。除了操纵眼镜的性别,年龄,表达和存在外,我们甚至还可以改变面部姿势并修复甘恩斯(Gans)意外制造的伪像。此外,我们进行了深入的面部身份分析和层次分析,以定量评估编辑结果。最后,我们通过采用GAN反转方法和基于Interfacegan建立的合成数据明确培训进料模型来将方法应用于真实的面部编辑。广泛的实验结果表明,学习综合面孔会自发带来散布和可控制的面部表示。
Although Generative Adversarial Networks (GANs) have made significant progress in face synthesis, there lacks enough understanding of what GANs have learned in the latent representation to map a random code to a photo-realistic image. In this work, we propose a framework called InterFaceGAN to interpret the disentangled face representation learned by the state-of-the-art GAN models and study the properties of the facial semantics encoded in the latent space. We first find that GANs learn various semantics in some linear subspaces of the latent space. After identifying these subspaces, we can realistically manipulate the corresponding facial attributes without retraining the model. We then conduct a detailed study on the correlation between different semantics and manage to better disentangle them via subspace projection, resulting in more precise control of the attribute manipulation. Besides manipulating the gender, age, expression, and presence of eyeglasses, we can even alter the face pose and fix the artifacts accidentally made by GANs. Furthermore, we perform an in-depth face identity analysis and a layer-wise analysis to evaluate the editing results quantitatively. Finally, we apply our approach to real face editing by employing GAN inversion approaches and explicitly training feed-forward models based on the synthetic data established by InterFaceGAN. Extensive experimental results suggest that learning to synthesize faces spontaneously brings a disentangled and controllable face representation.