DEYO：与YOLO进行分步对象检测

论文标题

DEYO：与YOLO进行分步对象检测

DEYO: DETR with YOLO for Step-by-Step Object Detection

论文作者

Ouyang, Haodong

论文摘要

对象检测是计算机视觉中的一个重要主题，后处理是典型对象检测管道的重要组成部分，其构成了重要的瓶颈，影响了传统对象检测模型的性能。检测变压器（DETR）是第一个端到端目标检测模型，它丢弃了手动组件（例如锚和非最大抑制（NMS））的需求，从而大大简化了目标检测过程。但是，与大多数传统对象检测模型相比，DETR收敛非常缓慢，并且查询的含义晦涩难懂。因此，受分步概念的启发，本文提出了一种新的两阶段对象检测模型，该模型名为Yolo（Deyo），该模型依赖于逐步推断来解决上述问题。 Deyo是一个两阶段的体系结构，包括经典目标检测模型和类似DITR的模型作为第一阶段和第二阶段。具体而言，第一阶段提供了高质量的查询和锚定供给到第二阶段，与原始DETR模型相比，提高了第二阶段的性能和效率。同时，第二阶段弥补了由第一阶段检测器的局限性引起的性能降解。广泛的实验表明，DEYO在12和36个时期分别达到50.6 AP和52.1 AP，同时使用Resnet-50作为可可数据集中的骨干和多尺度功能。与Dino（一种最佳的DITR样模型）相比，开发的DEYO模型在两个时期设置中可显着提高1.6 AP和1.2 AP。

Object detection is an important topic in computer vision, with post-processing, an essential part of the typical object detection pipeline, posing a significant bottleneck affecting the performance of traditional object detection models. The detection transformer (DETR), as the first end-to-end target detection model, discards the requirement of manual components like the anchor and non-maximum suppression (NMS), significantly simplifying the target detection process. However, compared with most traditional object detection models, DETR converges very slowly, and a query's meaning is obscure. Thus, inspired by the Step-by-Step concept, this paper proposes a new two-stage object detection model, named DETR with YOLO (DEYO), which relies on a progressive inference to solve the above problems. DEYO is a two-stage architecture comprising a classic target detection model and a DETR-like model as the first and second stages, respectively. Specifically, the first stage provides high-quality query and anchor feeding into the second stage, improving the performance and efficiency of the second stage compared to the original DETR model. Meanwhile, the second stage compensates for the performance degradation caused by the first stage detector's limitations. Extensive experiments demonstrate that DEYO attains 50.6 AP and 52.1 AP in 12 and 36 epochs, respectively, while utilizing ResNet-50 as the backbone and multi-scale features on the COCO dataset. Compared with DINO, an optimal DETR-like model, the developed DEYO model affords a significant performance improvement of 1.6 AP and 1.2 AP in two epoch settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题