DQNET：伪装对象检测的跨模型细节查询

论文标题

DQNET：伪装对象检测的跨模型细节查询

DQnet: Cross-Model Detail Querying for Camouflaged Object Detection

论文作者

Sun, Wei, Liu, Chengao, Zhang, Linyan, Li, Yu, Wei, Pengxu, Liu, Chang, Zou, Jialing, Jiao, Jianbin, Ye, Qixiang

论文摘要

伪装的物体与周围环境无缝混合，这在计算机视觉中带来了具有挑战性的检测任务。优化伪装对象检测（COD）的卷积神经网络（CNN）倾向于激活局部区分区域，同时忽略完整的对象范围，从而导致部分激活问题，这不可避免地导致对象缺失或冗余区域。在本文中，我们认为部分激活是由CNN的固有特征引起的，在该特征中，卷积操作产生了局部接收场并经历了捕获图像区域之间远程特征依赖性的困难。为了获得可以激活完整对象范围的特征图，防止分段结果被嘈杂的特征淹没，提出了一个新颖的框架，称为跨模型细节查询网络（DQNET）。它的原因是，远程感知表示形式与多尺度的本地细节之间的关系使增强的表示完全突出了对象区域并消除了非对象区域上的噪声。具体而言，使用自我监督学习（SSL）预处理的香草VIT用于模拟图像区域之间的远程依赖性。使用重新连接来使学习以多个尺度学习细粒度的空间局部细节。然后，为了有效检索与对象相关的详细信息，提出了一个基于关系的查询（RBQ）模块，以探索全局表示与多规模本地详细信息之间基于窗口的交互。广泛的实验是在广泛使用的COD数据集上进行的，并表明我们的DQNET优于当前的最新技术。

Camouflaged objects are seamlessly blended in with their surroundings, which brings a challenging detection task in computer vision. Optimizing a convolutional neural network (CNN) for camouflaged object detection (COD) tends to activate local discriminative regions while ignoring complete object extent, causing the partial activation issue which inevitably leads to missing or redundant regions of objects. In this paper, we argue that partial activation is caused by the intrinsic characteristics of CNN, where the convolution operations produce local receptive fields and experience difficulty to capture long-range feature dependency among image regions. In order to obtain feature maps that could activate full object extent, keeping the segmental results from being overwhelmed by noisy features, a novel framework termed Cross-Model Detail Querying network (DQnet) is proposed. It reasons the relations between long-range-aware representations and multi-scale local details to make the enhanced representation fully highlight the object regions and eliminate noise on non-object regions. Specifically, a vanilla ViT pretrained with self-supervised learning (SSL) is employed to model long-range dependencies among image regions. A ResNet is employed to enable learning fine-grained spatial local details in multiple scales. Then, to effectively retrieve object-related details, a Relation-Based Querying (RBQ) module is proposed to explore window-based interactions between the global representations and the multi-scale local details. Extensive experiments are conducted on the widely used COD datasets and show that our DQnet outperforms the current state-of-the-arts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题