扩散视频自动编码器：通过删除视频编码为暂时一致的面部视频编辑

论文标题

扩散视频自动编码器：通过删除视频编码为暂时一致的面部视频编辑

Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding

论文作者

Kim, Gyeongman, Shim, Hajin, Kim, Hyunsu, Choi, Yunjey, Kim, Junho, Yang, Eunho

论文摘要

受到最近面部图像编辑方法令人印象深刻的表现的启发，自然提出了一些研究将这些方法扩展到面部视频编辑任务。这里的主要挑战之一是编辑框架之间的时间一致性，但尚未解决。为此，我们提出了一个基于扩散自动编码器的新颖面部视频编辑框架，该框架可以成功提取给定视频的身份和动作的第一次被分解的功能（作为面部视频编辑模型）。这种建模使我们能够通过简单地将时间不变的功能与所需方向进行一致性进行编辑。我们模型的另一个独特优势是，由于我们的模型基于扩散模型，因此它可以同时满足重建和编辑功能，并且与现有的基于GAN的方法不同，它在野生脸部视频（例如遮挡的面孔）中的拐角案例很强。

Inspired by the impressive performance of recent face image editing methods, several studies have been naturally proposed to extend these methods to the face video editing task. One of the main challenges here is temporal consistency among edited frames, which is still unresolved. To this end, we propose a novel face video editing framework based on diffusion autoencoders that can successfully extract the decomposed features - for the first time as a face video editing model - of identity and motion from a given video. This modeling allows us to edit the video by simply manipulating the temporally invariant feature to the desired direction for the consistency. Another unique strength of our model is that, since our model is based on diffusion models, it can satisfy both reconstruction and edit capabilities at the same time, and is robust to corner cases in wild face videos (e.g. occluded faces) unlike the existing GAN-based methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题