基于注意的语义场景完成的多模式融合网络

论文标题

基于注意的语义场景完成的多模式融合网络

Attention-based Multi-modal Fusion Network for Semantic Scene Completion

论文作者

Li, Siqi, Zou, Changqing, Li, Yipeng, Zhao, Xibin, Gao, Yue

论文摘要

本文提出了一个端到端的3D卷积网络，用于基于注意的多模式融合网络（AMFNET），用于语义场景完成（SSC）任务，以从单视图RGB-D图像中推断出体积3D场景的占用和语义标签。与以前仅使用从RGB-D图像中提取的语义特征的方法相比，提出的AMFNET学会通过利用从RGB-D图像推断2D语义分割的经验来同时执行有效的3D场景完成和语义分割，以及在层面上的可靠深度线索。它是通过使用从2D语义分段和由残留注意力块增强的3D语义完成网络提升的多模式融合体系结构来实现的。我们在合成SUNCG-RGBD数据集和实际NYUV2数据集上验证了我们的方法，结果表明，我们的方法在合成的SUNCG-RGBD数据集和实际NYUV2数据集上分别实现了2.5％和2.6％的增长。

This paper presents an end-to-end 3D convolutional network named attention-based multi-modal fusion network (AMFNet) for the semantic scene completion (SSC) task of inferring the occupancy and semantic labels of a volumetric 3D scene from single-view RGB-D images. Compared with previous methods which use only the semantic features extracted from RGB-D images, the proposed AMFNet learns to perform effective 3D scene completion and semantic segmentation simultaneously via leveraging the experience of inferring 2D semantic segmentation from RGB-D images as well as the reliable depth cues in spatial dimension. It is achieved by employing a multi-modal fusion architecture boosted from 2D semantic segmentation and a 3D semantic completion network empowered by residual attention blocks. We validate our method on both the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset and the results show that our method respectively achieves the gains of 2.5% and 2.6% on the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset against the state-of-the-art method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题