脱钩的自我注意力以准确的一个阶段对象检测

论文标题

脱钩的自我注意力以准确的一个阶段对象检测

Decoupled Self Attention for Accurate One Stage Object Detection

论文作者

WU, Kehe, Chen, Zuge, MA, Qi, Zhang, Xiaoliang, Li, Wei

论文摘要

由于对象检测数据集的尺度小于图像识别数据集成像网的规模，因此传输学习已成为深度学习对象检测模型的基本训练方法，这将使Imagenet数据集上的对象检测模型的骨干网络为提取分类和本地化子任务的特征提取功能。但是，分类任务的重点关注对象的显着区域特征，而位置任务则集中在对象的边缘特征上，因此在预审计的骨干网络提取的功能与用于本地化任务的功能之间存在一定的偏差。为了解决这个问题，本文中提出了一个脱钩的自我注意（DSA）模块。 DSA包含两个解耦的自我注意力分支，因此可以为不同任务提取适当的功能。它位于子任务的FPN和头部网络之间，因此它用于基于FPN融合的功能来提取全局功能，以独立用于不同任务。尽管DSA模块的网络很简单，但是它可以有效地提高对象检测的性能，但也可以轻松地嵌入许多检测模型中。我们的实验基于代表性的一阶段检测模型视网膜。在可可数据集中，当使用RESNET50和RESNET101用作骨干网络时，检测性能可以分别增加0.4％AP和0.5％的AP。当将DSA模块和对象置信度任务一起应用于视网膜时，基于RESNET50和RESNET101的检测性能分别增加1.0％AP和1.4％的AP。实验结果显示了DSA模块的有效性。代码为：https：//github.com/chenzuge1/dsanet.git。

As the scale of object detection dataset is smaller than that of image recognition dataset ImageNet, transfer learning has become a basic training method for deep learning object detection models, which will pretrain the backbone network of object detection model on ImageNet dataset to extract features for classification and localization subtasks. However, the classification task focuses on the salient region features of object, while the location task focuses on the edge features of object, so there is certain deviation between the features extracted by pretrained backbone network and the features used for localization task. In order to solve this problem, a decoupled self attention(DSA) module is proposed for one stage object detection models in this paper. DSA includes two decoupled self-attention branches, so it can extract appropriate features for different tasks. It is located between FPN and head networks of subtasks, so it is used to extract global features based on FPN fused features for different tasks independently. Although the network of DSA module is simple, but it can effectively improve the performance of object detection, also it can be easily embedded in many detection models. Our experiments are based on the representative one-stage detection model RetinaNet. In COCO dataset, when ResNet50 and ResNet101 are used as backbone networks, the detection performances can be increased by 0.4% AP and 0.5% AP respectively. When DSA module and object confidence task are applied in RetinaNet together, the detection performances based on ResNet50 and ResNet101 can be increased by 1.0% AP and 1.4% AP respectively. The experiment results show the effectiveness of DSA module. Code is at: https://github.com/chenzuge1/DSANet.git.

下载PDF全文

下载文献需遵守相关版权规定

论文标题