学习文本对象的粒度统一表示，重新识别

论文标题

学习文本对象的粒度统一表示，重新识别

Learning Granularity-Unified Representations for Text-to-Image Person Re-identification

论文作者

Shao, Zhiyin, Zhang, Xinyu, Fang, Meng, Lin, Zhifeng, Wang, Jian, Ding, Changxing

论文摘要

文本对象重新识别（REID）旨在通过文本描述搜索感兴趣的身份的行人图像。由于丰富的模式内变化和明显的模式间间隙，这是具有挑战性的。现有作品通常会忽略两种方式之间特征粒度的差异，即，视觉特征通常是细粒度的，而文本特征则粗糙，这主要负责较大的模式间间隙。在本文中，我们提出了一个基于变压器的端到端框架，以学习两种模态的粒度统一表示，称为LGUR。 LGUR框架包含两个模块：基于字典的粒度比对（DGA）模块和基于原型的粒度统一（PGU）模块。在DGA中，为了使两种模式的粒度对齐，我们引入了一个多模式共享词典（MSD），以重建视觉和文本特征。此外，DGA还具有两个重要因素，即跨模式指导和以前景为中心的重建，以促进MSD的优化。在PGU中，我们采用一组共享和可学习的原型作为查询，以提取粒度统一特征空间中这两种方式的多样化和语义对齐特征，从而进一步促进了REID的性能。综合实验表明，我们的LGUR在Cuhk-Pedes和ICFG-Pedes数据集上始终以大幅度的优于最先进的东西。代码将在https://github.com/zhiyinshao-h/lgur上发布。

Text-to-image person re-identification (ReID) aims to search for pedestrian images of an interested identity via textual descriptions. It is challenging due to both rich intra-modal variations and significant inter-modal gaps. Existing works usually ignore the difference in feature granularity between the two modalities, i.e., the visual features are usually fine-grained while textual features are coarse, which is mainly responsible for the large inter-modal gaps. In this paper, we propose an end-to-end framework based on transformers to learn granularity-unified representations for both modalities, denoted as LGUR. LGUR framework contains two modules: a Dictionary-based Granularity Alignment (DGA) module and a Prototype-based Granularity Unification (PGU) module. In DGA, in order to align the granularities of two modalities, we introduce a Multi-modality Shared Dictionary (MSD) to reconstruct both visual and textual features. Besides, DGA has two important factors, i.e., the cross-modality guidance and the foreground-centric reconstruction, to facilitate the optimization of MSD. In PGU, we adopt a set of shared and learnable prototypes as the queries to extract diverse and semantically aligned features for both modalities in the granularity-unified feature space, which further promotes the ReID performance. Comprehensive experiments show that our LGUR consistently outperforms state-of-the-arts by large margins on both CUHK-PEDES and ICFG-PEDES datasets. Code will be released at https://github.com/ZhiyinShao-H/LGUR.

下载PDF全文

下载文献需遵守相关版权规定

论文标题