NEUVV：带有沉浸式渲染和编辑的神经容量视频

论文标题

NEUVV：带有沉浸式渲染和编辑的神经容量视频

NeuVV: Neural Volumetric Videos with Immersive Rendering and Editing

论文作者

Zhang, Jiakai, Wang, Liao, Liu, Xinhang, Zhao, Fuqiang, Li, Minzhang, Dai, Haizhao, Zhang, Boyuan, Yang, Wei, Xu, Lan, Yu, Jingyi

论文摘要

Metaverse承诺提供的一些最激动人心的体验，例如，在虚拟环境中与虚拟角色实时互动需要实时的照片真实渲染。 3D重建方法的渲染，主动或被动，仍然需要大量的清理工作来修复网格或点云。在本文中，我们提出了一种称为神经体积视频或NEUVV的神经量图技术，以支持具有照相真实性和实时实时的体积视频内容的沉浸式，互动和时空渲染。 NEUVV的核心是有效地将动态神经辐射场（NERF）编码为可渲染且可编辑的原始素。我们介绍了两种类型的分解方案：一种超球体谐波（HH）分解，用于在时空和时间上建模光滑的颜色变化以及可学习的基础表示，用于建模由运动引起的突然密度和颜色变化。 NEUVV分解可以集成到类似于Plenoctree的视频OCTREE（voctree）中，以显着加速训练，同时减少内存开销。实时NEUVV渲染进一步实现了一类沉浸式内容编辑工具。具体而言，NEUVV将每个词源视为原始的，并实现基于体积的深度顺序和α融合，以实现以内容重新使用的时空组成。例如，我们在不同的3D位置表现出相同性能的定位各种表现，并以不同的计时，调整表演者服装的颜色/纹理，铸造聚光灯阴影和综合距离偏离照明等，所有这些都以交互式速度。我们进一步开发了一个混合神经固定化渲染框架，以支持消费者级的VR耳机，以便可以在虚拟3D空间中首次深入地进行上述体积视频观看和编辑。

Some of the most exciting experiences that Metaverse promises to offer, for instance, live interactions with virtual characters in virtual environments, require real-time photo-realistic rendering. 3D reconstruction approaches to rendering, active or passive, still require extensive cleanup work to fix the meshes or point clouds. In this paper, we present a neural volumography technique called neural volumetric video or NeuVV to support immersive, interactive, and spatial-temporal rendering of volumetric video contents with photo-realism and in real-time. The core of NeuVV is to efficiently encode a dynamic neural radiance field (NeRF) into renderable and editable primitives. We introduce two types of factorization schemes: a hyper-spherical harmonics (HH) decomposition for modeling smooth color variations over space and time and a learnable basis representation for modeling abrupt density and color changes caused by motion. NeuVV factorization can be integrated into a Video Octree (VOctree) analogous to PlenOctree to significantly accelerate training while reducing memory overhead. Real-time NeuVV rendering further enables a class of immersive content editing tools. Specifically, NeuVV treats each VOctree as a primitive and implements volume-based depth ordering and alpha blending to realize spatial-temporal compositions for content re-purposing. For example, we demonstrate positioning varied manifestations of the same performance at different 3D locations with different timing, adjusting color/texture of the performer's clothing, casting spotlight shadows and synthesizing distance falloff lighting, etc, all at an interactive speed. We further develop a hybrid neural-rasterization rendering framework to support consumer-level VR headsets so that the aforementioned volumetric video viewing and editing, for the first time, can be conducted immersively in virtual 3D space.

下载PDF全文

下载文献需遵守相关版权规定

论文标题