图生：可控制的3D感知场景合成的空间散开的生成辐射场

论文标题

图生：可控制的3D感知场景合成的空间散开的生成辐射场

DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis

论文作者

Xu, Yinghao, Chai, Menglei, Shi, Zifan, Peng, Sida, Skorokhodov, Ivan, Siarohin, Aliaksandr, Yang, Ceyuan, Shen, Yujun, Lee, Hsin-Ying, Zhou, Bolei, Tulyakov, Sergey

论文摘要

现有的3D感知图像综合方法主要集中于生成一个单一的规范对象，并显示出有限的能力来构成包含各种对象的复杂场景。这项工作呈现了二次：用于高质量和可控场景合成的3Daware生成模型。我们方法的关键要素是一个非常抽象的对象级表示（即，没有语义注释的3D边界框）作为场景布局先验，很容易获得，一般要描述各种场景内容，但可以详细介绍对象和背景。此外，它是一个直观的用户控制场景编辑。基于这样的先验，提出的模型通过仅在2D图像上学习具有全局本地歧视的2D图像，从而将整个场景分解为以对象为中心的生成辐射场。我们的模型获得了生成的保真度和编辑单个对象的灵活性，同时能够有效地将对象和背景构成完整的场景。我们在许多场景数据集中演示了最先进的性能，包括具有挑战性的Waymo户外数据集。项目页面：https：//snap-research.github.io/discoscene/

Existing 3D-aware image synthesis approaches mainly focus on generating a single canonical object and show limited capacity in composing a complex scene containing a variety of objects. This work presents DisCoScene: a 3Daware generative model for high-quality and controllable scene synthesis. The key ingredient of our method is a very abstract object-level representation (i.e., 3D bounding boxes without semantic annotation) as the scene layout prior, which is simple to obtain, general to describe various scene contents, and yet informative to disentangle objects and background. Moreover, it serves as an intuitive user control for scene editing. Based on such a prior, the proposed model spatially disentangles the whole scene into object-centric generative radiance fields by learning on only 2D images with the global-local discrimination. Our model obtains the generation fidelity and editing flexibility of individual objects while being able to efficiently compose objects and the background into a complete scene. We demonstrate state-of-the-art performance on many scene datasets, including the challenging Waymo outdoor dataset. Project page: https://snap-research.github.io/discoscene/

下载PDF全文

下载文献需遵守相关版权规定

论文标题