论文标题
CIR-NET:RGB-D显着对象检测的跨模式互动和完善
CIR-Net: Cross-modality Interaction and Refinement for RGB-D Salient Object Detection
论文作者
论文摘要
关注如何有效捕获和利用RGB-D显着对象检测(SOD)任务中的跨模式信息的问题,我们提出了一个基于新颖的跨模性互动和改进的卷积神经网络(CNN)模型,称为CIR-NET。 1)提议在编码器阶段充分整合RGB-D特征表示形式,而2)提出了一个融合聚合结构,将RGB和DEPTH解码分解为相应的RGB-D解码流中的RGB-D特征通过重要的RGB-D特征通过重要的RGB-D特征,通过重大的融合型融合级的解码,将渐进的注意力集成单元进行充分整合RGB-D特征表示。对于跨模式的完善,我们在编码器和解码器之间插入了完善的中间件结构,其中RGB,DEPTH和RGB-D编码器功能通过连续使用自模式注意的精炼单元和交叉模式加权精炼单元进一步完善。最后,随着逐渐完善的功能,我们预测了解码器阶段的显着图。在六个流行的RGB-D SOD基准上进行的广泛实验表明,我们的网络在定性和定量上都优于最先进的显着性检测器。
Focusing on the issue of how to effectively capture and utilize cross-modality information in RGB-D salient object detection (SOD) task, we present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement. For the cross-modality interaction, 1) a progressive attention guided integration unit is proposed to sufficiently integrate RGB-D feature representations in the encoder stage, and 2) a convergence aggregation structure is proposed, which flows the RGB and depth decoding features into the corresponding RGB-D decoding streams via an importance gated fusion unit in the decoder stage. For the cross-modality refinement, we insert a refinement middleware structure between the encoder and the decoder, in which the RGB, depth, and RGB-D encoder features are further refined by successively using a self-modality attention refinement unit and a cross-modality weighting refinement unit. At last, with the gradually refined features, we predict the saliency map in the decoder stage. Extensive experiments on six popular RGB-D SOD benchmarks demonstrate that our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively.