CMX：与变压器的RGB-X语义分割的跨模式融合

论文标题

CMX：与变压器的RGB-X语义分割的跨模式融合

CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers

论文作者

Zhang, Jiaming, Liu, Huayao, Yang, Kailun, Hu, Xinxin, Liu, Ruiping, Stiefelhagen, Rainer

论文摘要

基于图像分割的场景理解是自动驾驶汽车的关键组成部分。可以通过利用补充方式（X模式）利用互补特征来提出RGB图像的像素语义分割。但是，由于不同模态之间传感器特性的变化，覆盖各种传感器的多种传感器仍然是一个尚未解决的问题。与以前的模式特异性方法不同，在这项工作中，我们为RGB-X语义分割提出了一个统一的融合框架CMX。为了跨不同方式概括，通常包括补充剂和不确定性，统一的跨模式相互作用对于模态融合至关重要。具体而言，我们设计了一个跨模式特征整流模块（CM-FRM）来校准双模式的特征，通过利用一种模式来纠正另一种模式的特征。使用整流功能对，我们部署了特征融合模块（FFM），以在混合之前进行足够的远程上下文交换。为了首次验证CMX，我们统一了与RGB互补的五种方式，即深度，热，极化，事件和激光雷达。广泛的实验表明，CMX可以很好地推广到多种模式融合，从而在五个RGB深度基准以及RGB-thermal，RGB极利化和RGB-LIDAR数据集上实现了最先进的性能。此外，为了调查密集数据融合的概括性，我们基于Eventscape数据集建立了RGB事实的语义分割基准，CMX在其上设置了新的最先进的ART。 CMX的源代码可在https://github.com/huaaaliu/rgbx_semantic_sementation上公开获得。

Scene understanding based on image segmentation is a crucial component of autonomous vehicles. Pixel-wise semantic segmentation of RGB images can be advanced by exploiting complementary features from the supplementary modality (X-modality). However, covering a wide variety of sensors with a modality-agnostic model remains an unresolved problem due to variations in sensor characteristics among different modalities. Unlike previous modality-specific methods, in this work, we propose a unified fusion framework, CMX, for RGB-X semantic segmentation. To generalize well across different modalities, that often include supplements as well as uncertainties, a unified cross-modal interaction is crucial for modality fusion. Specifically, we design a Cross-Modal Feature Rectification Module (CM-FRM) to calibrate bi-modal features by leveraging the features from one modality to rectify the features of the other modality. With rectified feature pairs, we deploy a Feature Fusion Module (FFM) to perform sufficient exchange of long-range contexts before mixing. To verify CMX, for the first time, we unify five modalities complementary to RGB, i.e., depth, thermal, polarization, event, and LiDAR. Extensive experiments show that CMX generalizes well to diverse multi-modal fusion, achieving state-of-the-art performances on five RGB-Depth benchmarks, as well as RGB-Thermal, RGB-Polarization, and RGB-LiDAR datasets. Besides, to investigate the generalizability to dense-sparse data fusion, we establish an RGB-Event semantic segmentation benchmark based on the EventScape dataset, on which CMX sets the new state-of-the-art. The source code of CMX is publicly available at https://github.com/huaaaliu/RGBX_Semantic_Segmentation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题