论文标题
Legan:通过利用人类的感知判断来解开方向照明和面部表情的操纵
LEGAN: Disentangled Manipulation of Directional Lighting and Facial Expressions by Leveraging Human Perceptual Judgements
论文作者
论文摘要
概括在照明和面部表情中极端变化的构建面部分析系统是一个具有挑战性的问题,可以使用自然看起来的合成数据来减轻它。在此方面,我们提出了Legan,这是一种新型的合成框架,该框架利用感知质量判断来共同操纵面部图像中的照明和表达,而无需配对训练数据。 Legan将照明和表达子空间解开,并在将特征空间进行转换,然后再升级到所需的输出图像。通过整合使用多种合成方法呈现的面部图像及其人群自然等级的面部图像训练的感知质量估计模型,进一步完善了综合图像的忠诚度,并将其作为辅助歧视器纳入Legan框架。与流行的GAN模型(如Stargan和Stargan-V2)相比,使用FID和LPIP等客观指标,可以产生更高质量的面部图像。我们还使用Legan和其他GAN模型合成的图像进行了感知研究,并显示了我们的质量估计与视觉保真度之间的相关性。最后,我们证明了Legan作为表达识别和面部验证任务的培训数据增强器的有效性。
Building facial analysis systems that generalize to extreme variations in lighting and facial expressions is a challenging problem that can potentially be alleviated using natural-looking synthetic data. Towards that, we propose LEGAN, a novel synthesis framework that leverages perceptual quality judgments for jointly manipulating lighting and expressions in face images, without requiring paired training data. LEGAN disentangles the lighting and expression subspaces and performs transformations in the feature space before upscaling to the desired output image. The fidelity of the synthetic image is further refined by integrating a perceptual quality estimation model, trained with face images rendered using multiple synthesis methods and their crowd-sourced naturalness ratings, into the LEGAN framework as an auxiliary discriminator. Using objective metrics like FID and LPIPS, LEGAN is shown to generate higher quality face images when compared with popular GAN models like StarGAN and StarGAN-v2 for lighting and expression synthesis. We also conduct a perceptual study using images synthesized by LEGAN and other GAN models and show the correlation between our quality estimation and visual fidelity. Finally, we demonstrate the effectiveness of LEGAN as training data augmenter for expression recognition and face verification tasks.