ACSEG：无监督语义分割的自适应概念化

论文标题

ACSEG：无监督语义分割的自适应概念化

ACSeg: Adaptive Conceptualization for Unsupervised Semantic Segmentation

论文作者

Li, Kehan, Wang, Zhennan, Cheng, Zesen, Yu, Runyi, Zhao, Yian, Song, Guoli, Liu, Chang, Yuan, Li, Chen, Jie

论文摘要

最近，自我监督的大规模视觉预训练模型在代表像素级的语义关系方面表现出了巨大的希望，从而显着促进了无监督的密集预测任务的发展，例如无监督的语义细分（USS）。像素级表示之间的提取关系通常包含丰富的类感知信息，这些信息在表示空间中在语义上相同的像素嵌入在一起聚集在一起形成复杂的概念。但是，利用学习的模型来确定图像中的语义一致的像素组或区域是不平凡的，因为在不同图像的各种语义分布下，过度/群体群体淹没了概念化过程。在这项工作中，我们研究了自我监管的VIT预训练模型中的像素级语义聚集，作为图像分割，并提出了称为ACSEG的USS的自适应概念化方法。具体而言，我们将概念明确地编码为可学习的原型并设计自适应概念生成器（ACG），该概念生成器（ACG）将这些原型映射到每个图像的信息概念。同时，考虑到不同图像的场景复杂性，我们提出了模块化损失，以优化ACG独立于概念编号，基于估计属于同一概念的像素对的强度。最后，我们将USS任务转变为以无监督的方式对发现的概念进行分类。最先进的结果进行了广泛的实验证明了所提出的ACSEG的有效性。

Recently, self-supervised large-scale visual pre-training models have shown great promise in representing pixel-level semantic relationships, significantly promoting the development of unsupervised dense prediction tasks, e.g., unsupervised semantic segmentation (USS). The extracted relationship among pixel-level representations typically contains rich class-aware information that semantically identical pixel embeddings in the representation space gather together to form sophisticated concepts. However, leveraging the learned models to ascertain semantically consistent pixel groups or regions in the image is non-trivial since over/ under-clustering overwhelms the conceptualization procedure under various semantic distributions of different images. In this work, we investigate the pixel-level semantic aggregation in self-supervised ViT pre-trained models as image Segmentation and propose the Adaptive Conceptualization approach for USS, termed ACSeg. Concretely, we explicitly encode concepts into learnable prototypes and design the Adaptive Concept Generator (ACG), which adaptively maps these prototypes to informative concepts for each image. Meanwhile, considering the scene complexity of different images, we propose the modularity loss to optimize ACG independent of the concept number based on estimating the intensity of pixel pairs belonging to the same concept. Finally, we turn the USS task into classifying the discovered concepts in an unsupervised manner. Extensive experiments with state-of-the-art results demonstrate the effectiveness of the proposed ACSeg.

下载PDF全文

下载文献需遵守相关版权规定

论文标题