广义局灶性损失v2：学习可靠的定位质量估计密集对象检测

论文标题

广义局灶性损失v2：学习可靠的定位质量估计密集对象检测

Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection

论文作者

Li, Xiang, Wang, Wenhai, Hu, Xiaolin, Li, Jun, Tang, Jinhui, Yang, Jian

论文摘要

本地化质量估计（LQE）在最近的密集对象探测器的最新进步中至关重要且流行，因为它可以提供准确的排名得分，从而使非最大最大抑制作用加工和改善检测性能有益。作为一种常见的做法，大多数现有方法通过与对象分类或边界框回归共享的香草卷积特征来预测LQE分数。在本文中，我们探讨了一个完全新颖的和不同的观点，可以根据边界框的四个参数的学识渊博的分布来执行LQE。边界框分布在GFLV1中被启发并引入为“常规分布”，它很好地描述了预测边界框的不确定性。这样的属性使边界框的分布统计数据与其实际本地化质量高度相关。具体而言，具有尖峰的边界盒分布通常对应于高定位质量，反之亦然。通过利用分布统计和实际定位质量之间的密切相关性，我们为基于GFLV1的可靠LQE开发了相当轻巧的分布引导的质量预测指标（DGQP），从而产生了GFLV2。据我们所知，这是对象检测使用高度相关的统计表示来促进LQE的尝试。广泛的实验证明了我们方法的有效性。值得注意的是，GFLV2（RESNET-101）以14.6 fps的成绩达到46.2 AP，超过了以前的最先进的ATS基线（43.6 ap at 14.6 fps）在Coco {\ tt test-Dev}上的Abs ablesute 2.6 AP，而无需提高培训和培训效率。代码将在https://github.com/implus/gfocalv2上找到。

Localization Quality Estimation (LQE) is crucial and popular in the recent advancement of dense object detectors since it can provide accurate ranking scores that benefit the Non-Maximum Suppression processing and improve detection performance. As a common practice, most existing methods predict LQE scores through vanilla convolutional features shared with object classification or bounding box regression. In this paper, we explore a completely novel and different perspective to perform LQE -- based on the learned distributions of the four parameters of the bounding box. The bounding box distributions are inspired and introduced as "General Distribution" in GFLV1, which describes the uncertainty of the predicted bounding boxes well. Such a property makes the distribution statistics of a bounding box highly correlated to its real localization quality. Specifically, a bounding box distribution with a sharp peak usually corresponds to high localization quality, and vice versa. By leveraging the close correlation between distribution statistics and the real localization quality, we develop a considerably lightweight Distribution-Guided Quality Predictor (DGQP) for reliable LQE based on GFLV1, thus producing GFLV2. To our best knowledge, it is the first attempt in object detection to use a highly relevant, statistical representation to facilitate LQE. Extensive experiments demonstrate the effectiveness of our method. Notably, GFLV2 (ResNet-101) achieves 46.2 AP at 14.6 FPS, surpassing the previous state-of-the-art ATSS baseline (43.6 AP at 14.6 FPS) by absolute 2.6 AP on COCO {\tt test-dev}, without sacrificing the efficiency both in training and inference. Code will be available at https://github.com/implus/GFocalV2.

下载PDF全文

下载文献需遵守相关版权规定

论文标题