剪辑-PAE：嵌入投影弹式，以提取相关特征，以置于脱节，可解释和可控的文本引导的面部操纵

论文标题

剪辑-PAE：嵌入投影弹式，以提取相关特征，以置于脱节，可解释和可控的文本引导的面部操纵

CLIP-PAE: Projection-Augmentation Embedding to Extract Relevant Features for a Disentangled, Interpretable, and Controllable Text-Guided Face Manipulation

论文作者

Zhou, Chenliang, Zhong, Fangcheng, Oztireli, Cengiz

论文摘要

最近，通过将它们嵌入联合潜在空间中，引入了对比的语言图像预训练（剪辑）桥接图像和文本。这为旨在通过提供文本解释来操纵输入图像的文献开辟了大门。但是，由于图像和关节空间中的文本嵌入之间的差异，使用文本嵌入作为优化目标通常会在结果图像中引入不希望的伪影。拆卸，可解释性和可控性也很难保证操纵。为了减轻这些问题，我们建议定义由相关提示跨越的语料库子空间捕获特定的图像特征。我们将剪辑投影仪嵌入（PAE）作为优化目标，以提高文本引导的图像操纵的性能。我们的方法是一种简单而通用的范式，可以轻松地计算和调整，并平稳地合并到任何基于夹的图像操纵算法中。为了证明我们方法的有效性，我们进行了几项理论和经验研究。作为案例研究，我们利用该方法来指导语义面部编辑。我们定量和定性地证明，PAE促进了具有最先进的质量和准确性的更加不明显，可解释和可控的图像操纵。项目页面：https：//chenliang-zhou.github.io/clip-pae/。

Recently introduced Contrastive Language-Image Pre-Training (CLIP) bridges images and text by embedding them into a joint latent space. This opens the door to ample literature that aims to manipulate an input image by providing a textual explanation. However, due to the discrepancy between image and text embeddings in the joint space, using text embeddings as the optimization target often introduces undesired artifacts in the resulting images. Disentanglement, interpretability, and controllability are also hard to guarantee for manipulation. To alleviate these problems, we propose to define corpus subspaces spanned by relevant prompts to capture specific image characteristics. We introduce CLIP Projection-Augmentation Embedding (PAE) as an optimization target to improve the performance of text-guided image manipulation. Our method is a simple and general paradigm that can be easily computed and adapted, and smoothly incorporated into any CLIP-based image manipulation algorithm. To demonstrate the effectiveness of our method, we conduct several theoretical and empirical studies. As a case study, we utilize the method for text-guided semantic face editing. We quantitatively and qualitatively demonstrate that PAE facilitates a more disentangled, interpretable, and controllable image manipulation with state-of-the-art quality and accuracy. Project page: https://chenliang-zhou.github.io/CLIP-PAE/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题