论文标题
稀疏R-CNN:可学习的建议端到端对象检测
Sparse R-CNN: End-to-End Object Detection with Learnable Proposals
论文作者
论文摘要
我们提出了稀疏的R-CNN,这是一种纯粹的稀疏方法,用于图像中的对象检测。现有的对象检测的作品在很大程度上依赖于密集的对象候选物,例如$ k $锚盒预先定义在大小$ h \ times w $的图像特征图的所有网格上。但是,在我们的方法中,将固定的稀疏对象建议集(总长度为$ n $)提供给对象识别头,以执行分类和位置。通过消除$ hwk $(多达数十万)手工设计的对象候选者到$ n $(例如100)可学习的建议,稀疏的R-CNN完全避免了与对象候选者设计和多对一标签分配有关的所有工作。更重要的是,最终预测是直接输出的,而无需非最大抑制后处理。稀疏的R-CNN表现出与挑战性的可可数据集上的良好探测器基准相同的准确性,运行时间和训练融合性能,例如,在标准$ 3 \ times $ 3 \ times $ thimes $培训计划中实现45.0 AP,并使用Resnet-50 FPN型号在22 FPS上运行。我们希望我们的工作能够激发对象探测器中稠密的先验惯例的重新思考。该代码可在以下网址获得:https://github.com/peizeuns/sparser-cnn。
We present Sparse R-CNN, a purely sparse method for object detection in images. Existing works on object detection heavily rely on dense object candidates, such as $k$ anchor boxes pre-defined on all grids of image feature map of size $H\times W$. In our method, however, a fixed sparse set of learned object proposals, total length of $N$, are provided to object recognition head to perform classification and location. By eliminating $HWk$ (up to hundreds of thousands) hand-designed object candidates to $N$ (e.g. 100) learnable proposals, Sparse R-CNN completely avoids all efforts related to object candidates design and many-to-one label assignment. More importantly, final predictions are directly output without non-maximum suppression post-procedure. Sparse R-CNN demonstrates accuracy, run-time and training convergence performance on par with the well-established detector baselines on the challenging COCO dataset, e.g., achieving 45.0 AP in standard $3\times$ training schedule and running at 22 fps using ResNet-50 FPN model. We hope our work could inspire re-thinking the convention of dense prior in object detectors. The code is available at: https://github.com/PeizeSun/SparseR-CNN.