Panoptic Swiftnet：实时泛型分割的金字塔融合

论文标题

Panoptic Swiftnet：实时泛型分割的金字塔融合

Panoptic SwiftNet: Pyramidal Fusion for Real-time Panoptic Segmentation

论文作者

Šarić, Josip, Oršić, Marin, Šegvić, Siniša

论文摘要

在许多现有应用中，诸如自动驾驶，自动仓库或遥感的许多现有应用中，密集的综合预测是一个关键要素。这些应用中的许多应用都需要对负担得起甚至嵌入式硬件的大型输入分辨率进行快速推断。我们建议通过将骨干能力进行多尺度特征提取来实现这一目标。与同时的综合分割方法相比，我们方法的主要新颖性是有效的比例等级特征提取，通过金字塔融合通过跨尺度提取和对像素到内在分配的边界感知的学习。由于典型的城市范围内和范围范围的数据集中有大量像素，因此提出的方法非常适合遥感图像。我们介绍了有关CityScapes，Vistas，Coco和BSB-Aerial数据集的全景实验。我们的模型在BSB-Aerial数据集上优于最新技术，同时能够在具有FP16精度和张力优化的RTX3090 GPU上处理每秒一百多个1MPX图像。

Dense panoptic prediction is a key ingredient in many existing applications such as autonomous driving, automated warehouses or remote sensing. Many of these applications require fast inference over large input resolutions on affordable or even embedded hardware. We propose to achieve this goal by trading off backbone capacity for multi-scale feature extraction. In comparison with contemporaneous approaches to panoptic segmentation, the main novelties of our method are efficient scale-equivariant feature extraction, cross-scale upsampling through pyramidal fusion and boundary-aware learning of pixel-to-instance assignment. The proposed method is very well suited for remote sensing imagery due to the huge number of pixels in typical city-wide and region-wide datasets. We present panoptic experiments on Cityscapes, Vistas, COCO and the BSB-Aerial dataset. Our models outperform the state of the art on the BSB-Aerial dataset while being able to process more than a hundred 1MPx images per second on a RTX3090 GPU with FP16 precision and TensorRT optimization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题