全景局势：学习一个统一模型的全景零件分割

论文标题

全景局势：学习一个统一模型的全景零件分割

Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation

论文作者

Li, Xiangtai, Xu, Shilin, Yang, Yibo, Cheng, Guangliang, Tong, Yunhai, Tao, Dacheng

论文摘要

Panoptic部件分割（PPS）旨在将泛型分割和部分分割统一为一个任务。以前的工作主要利用分开的方法来处理事物，东西和部分预测，而无需执行任何共享的计算和任务关联。在这项工作中，我们旨在将这些任务统一在架构层面上，设计第一个名为Panoptic-Partformer的端到端统一方法。特别是，由于视觉变压器的最新进展，我们将事物，内容和部分建模为对象查询，并直接学会优化所有三个预测作为统一掩码的预测和分类问题。我们设计了一个脱钩的解码器，以分别生成零件功能和事物/东西功能。然后，我们建议利用所有查询和相应的特征共同执行推理。最终掩码可以通过查询和相应特征之间的内部产品获得。广泛的消融研究和分析证明了我们框架的有效性。我们的全景优势在CityScapes PPS和Pascal上下文PPS数据集上实现了新的最新结果，具有至少70％的GFLOPS和50％的参数降低。特别是，在Pascal上下文PPS数据集上采用Swin Transformer后，我们可以通过RESNET50骨干链和10％的改进获得3.4％的相对改进。据我们所知，我们是第一个通过\ textit {统一和端到端变压器模型来解决PPS问题的人。鉴于其有效性和概念上的简单性，我们希望我们的全景伴侣可以充当良好的基准，并帮助未来的PPS研究。我们的代码和模型可在https://github.com/lxtgh/panoptic-partformer上找到。

Panoptic Part Segmentation (PPS) aims to unify panoptic segmentation and part segmentation into one task. Previous work mainly utilizes separated approaches to handle thing, stuff, and part predictions individually without performing any shared computation and task association. In this work, we aim to unify these tasks at the architectural level, designing the first end-to-end unified method named Panoptic-PartFormer. In particular, motivated by the recent progress in Vision Transformer, we model things, stuff, and part as object queries and directly learn to optimize the all three predictions as unified mask prediction and classification problem. We design a decoupled decoder to generate part feature and thing/stuff feature respectively. Then we propose to utilize all the queries and corresponding features to perform reasoning jointly and iteratively. The final mask can be obtained via inner product between queries and the corresponding features. The extensive ablation studies and analysis prove the effectiveness of our framework. Our Panoptic-PartFormer achieves the new state-of-the-art results on both Cityscapes PPS and Pascal Context PPS datasets with at least 70% GFlops and 50% parameters decrease. In particular, we get 3.4% relative improvements with ResNet50 backbone and 10% improvements after adopting Swin Transformer on Pascal Context PPS dataset. To the best of our knowledge, we are the first to solve the PPS problem via \textit{a unified and end-to-end transformer model. Given its effectiveness and conceptual simplicity, we hope our Panoptic-PartFormer can serve as a good baseline and aid future unified research for PPS. Our code and models are available at https://github.com/lxtGH/Panoptic-PartFormer.

下载PDF全文

下载文献需遵守相关版权规定

论文标题