Mask3D：3D语义实例分割的Mask Transformer

论文标题

Mask3D：3D语义实例分割的Mask Transformer

Mask3D: Mask Transformer for 3D Semantic Instance Segmentation

论文作者

Schult, Jonas, Engelmann, Francis, Hermans, Alexander, Litany, Or, Tang, Siyu, Leibe, Bastian

论文摘要

现代3D语义实例分割方法主要依赖于专业的投票机制，然后是精心设计的几何聚类技术。在基于最近基于变压器的对象检测和图像分割的方法的成功的基础上，我们提出了第一个基于变压器的3D语义实例分割的方法。我们表明，我们可以利用通用变压器构建块直接从3D点云中预测实例掩码。在我们的模型中，称为Mask3D，每个对象实例表示为实例查询。使用变压器解码器，通过迭代进行多个尺度上的云特征来学习实例查询。结合点特征，实例查询直接并行产生所有实例掩码。 Mask3D比当前的最新方法具有多个优势，因为它既不依赖于（1）需要手动选择的几何特性（例如中心）或（2）几何分组机制需要手动调整超参数（例如RADII）（例如Radii）（例如Radii）（例如3）使实例蒙版的损失。 Mask3D设置了扫描仪测试（+6.2 MAP），S3DIS 6倍（+10.1 MAP），STPLS3D（+11.2 MAP）和SCANNET200 TEST（+12.4 MAP）的新最先进的。

Modern 3D semantic instance segmentation approaches predominantly rely on specialized voting mechanisms followed by carefully designed geometric clustering techniques. Building on the successes of recent Transformer-based methods for object detection and image segmentation, we propose the first Transformer-based approach for 3D semantic instance segmentation. We show that we can leverage generic Transformer building blocks to directly predict instance masks from 3D point clouds. In our model called Mask3D each object instance is represented as an instance query. Using Transformer decoders, the instance queries are learned by iteratively attending to point cloud features at multiple scales. Combined with point features, the instance queries directly yield all instance masks in parallel. Mask3D has several advantages over current state-of-the-art approaches, since it neither relies on (1) voting schemes which require hand-selected geometric properties (such as centers) nor (2) geometric grouping mechanisms requiring manually-tuned hyper-parameters (e.g. radii) and (3) enables a loss that directly optimizes instance masks. Mask3D sets a new state-of-the-art on ScanNet test (+6.2 mAP), S3DIS 6-fold (+10.1 mAP), STPLS3D (+11.2 mAP) and ScanNet200 test (+12.4 mAP).

下载PDF全文

下载文献需遵守相关版权规定

论文标题