IM2NERF：图像到野外神经辐射场

论文标题

IM2NERF：图像到野外神经辐射场

im2nerf: Image to Neural Radiance Field in the Wild

论文作者

Mi, Lu, Kundu, Abhijit, Ross, David, Dellaert, Frank, Snavely, Noah, Fathi, Alireza

论文摘要

我们提出了IM2NERF，这是一个学习框架，该框架可以预测一个连续的神经对象表示，并在野生中进行单个输入图像，仅通过现成的识别方法进行分割输出而监督。构建神经辐射场的标准方法利用了多视图的一致性，需要对场景的许多校准视图，这是在野外学习大规模图像数据时无法满足的要求。我们通过引入将输入映像编码的模型引入分离的对象表示形式，该模型迈出了解决此缺点的一步，该模型包含一个用于对象形状的代码，对象外观的代码以及捕获对象图像的估计摄像头姿势。我们的模型条件在预测的对象表示上nerf，并使用音量渲染来从新视图中生成图像。我们将模型端到端训练大量输入图像。由于模型仅配有单视图像，因此问题高度不足。因此，除了在合成的输入视图上使用重建损失外，我们还对新颖的视图使用了辅助对手损失。此外，我们利用对象对称性和循环摄像头的姿势一致性。我们在Shapenet数据集上进行了广泛的定量和定性实验，并在开放图像数据集上进行了定性实验。我们表明，在所有情况下，IM2Nerf都从野外的单视图像中实现了新型视图合成的最新性能。

We propose im2nerf, a learning framework that predicts a continuous neural object representation given a single input image in the wild, supervised by only segmentation output from off-the-shelf recognition methods. The standard approach to constructing neural radiance fields takes advantage of multi-view consistency and requires many calibrated views of a scene, a requirement that cannot be satisfied when learning on large-scale image data in the wild. We take a step towards addressing this shortcoming by introducing a model that encodes the input image into a disentangled object representation that contains a code for object shape, a code for object appearance, and an estimated camera pose from which the object image is captured. Our model conditions a NeRF on the predicted object representation and uses volume rendering to generate images from novel views. We train the model end-to-end on a large collection of input images. As the model is only provided with single-view images, the problem is highly under-constrained. Therefore, in addition to using a reconstruction loss on the synthesized input view, we use an auxiliary adversarial loss on the novel rendered views. Furthermore, we leverage object symmetry and cycle camera pose consistency. We conduct extensive quantitative and qualitative experiments on the ShapeNet dataset as well as qualitative experiments on Open Images dataset. We show that in all cases, im2nerf achieves the state-of-the-art performance for novel view synthesis from a single-view unposed image in the wild.

下载PDF全文

下载文献需遵守相关版权规定

论文标题