论文标题
全景神经领域:语义感知的神经场景表示
Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation
论文作者
论文摘要
我们提出了Panoptic神经场(PNF),这是一种对象感知的神经场景表示,将场景分解为一组对象(事物)和背景(东西)。每个对象由方向的3D边界框和一个多层感知器(MLP)表示,该框采用位置,方向和时间以及输出密度和辐射。背景内容由类似的MLP表示,该MLP另外输出语义标签。每个对象MLP都是特定于实例的,因此比以前的对象感知方法更小,更快,同时仍利用通过元学习初始化掺入的类别特定的先验。我们的模型从仅彩色图像中构建了任何场景的全景辐射场表示。我们使用现成的算法来预测相机姿势,对象轨道和2D图像语义分段。然后,我们使用分析,共同优化MLP权重和边界框参数,并通过颜色图像和伪划分的自学意义分析,并从预测的语义分割中进行伪内。在实现现实世界动态场景的实验中,我们发现我们的模型可以有效地用于多个任务,例如新型视图合成,2D Panoptic Sementation,3D场景编辑和多视图深度预测。
We present Panoptic Neural Fields (PNF), an object-aware neural scene representation that decomposes a scene into a set of objects (things) and background (stuff). Each object is represented by an oriented 3D bounding box and a multi-layer perceptron (MLP) that takes position, direction, and time and outputs density and radiance. The background stuff is represented by a similar MLP that additionally outputs semantic labels. Each object MLPs are instance-specific and thus can be smaller and faster than previous object-aware approaches, while still leveraging category-specific priors incorporated via meta-learned initialization. Our model builds a panoptic radiance field representation of any scene from just color images. We use off-the-shelf algorithms to predict camera poses, object tracks, and 2D image semantic segmentations. Then we jointly optimize the MLP weights and bounding box parameters using analysis-by-synthesis with self-supervision from color images and pseudo-supervision from predicted semantic segmentations. During experiments with real-world dynamic scenes, we find that our model can be used effectively for several tasks like novel view synthesis, 2D panoptic segmentation, 3D scene editing, and multiview depth prediction.