D $^2 $ nerf：从单眼视频中自制的动态和静态对象的解耦

论文标题

D $^2 $ nerf：从单眼视频中自制的动态和静态对象的解耦

D$^2$NeRF: Self-Supervised Decoupling of Dynamic and Static Objects from a Monocular Video

论文作者

Wu, Tianhao, Zhong, Fangcheng, Tagliasacchi, Andrea, Cole, Forrester, Oztireli, Cengiz

论文摘要

在恢复静态环境时，鉴于单眼视频，分割和解耦动态对象是机器智能中广泛研究的问题。现有的解决方案通常在图像域中解决此问题，从而限制了它们对环境的性能和理解。我们介绍了脱钩的动态神经辐射场（D $^2 $ nerf），这是一种自制的方法，采用单眼视频，并学习了一个3D场景表示，该表示将移动对象（包括其阴影）从静态背景中移动。我们的方法通过两个单独的神经辐射场表示移动对象和静态背景，其中只有一个允许时间变化。这种方法的幼稚实现会导致动态组件接管静态的成分，因为前者的表示本质上更一般并且容易过度拟合。为此，我们提出了一种新颖的损失，以促进现象的正确分离。我们进一步提出了一个阴影场网络，以检测和解开动态移动的阴影。我们介绍了一个新的数据集，其中包含各种动态对象和阴影，并证明我们的方法可以在解耦动态和静态3D对象，遮挡和阴影删除以及移动对象的图像分段中获得比最先进的方法更好的性能。

Given a monocular video, segmenting and decoupling dynamic objects while recovering the static environment is a widely studied problem in machine intelligence. Existing solutions usually approach this problem in the image domain, limiting their performance and understanding of the environment. We introduce Decoupled Dynamic Neural Radiance Field (D$^2$NeRF), a self-supervised approach that takes a monocular video and learns a 3D scene representation which decouples moving objects, including their shadows, from the static background. Our method represents the moving objects and the static background by two separate neural radiance fields with only one allowing for temporal changes. A naive implementation of this approach leads to the dynamic component taking over the static one as the representation of the former is inherently more general and prone to overfitting. To this end, we propose a novel loss to promote correct separation of phenomena. We further propose a shadow field network to detect and decouple dynamically moving shadows. We introduce a new dataset containing various dynamic objects and shadows and demonstrate that our method can achieve better performance than state-of-the-art approaches in decoupling dynamic and static 3D objects, occlusion and shadow removal, and image segmentation for moving objects.

下载PDF全文

下载文献需遵守相关版权规定

论文标题