视频人Reid中的域适应性域适应的摄像机对齐和加权对比度学习

论文标题

视频人Reid中的域适应性域适应的摄像机对齐和加权对比度学习

Camera Alignment and Weighted Contrastive Learning for Domain Adaptation in Video Person ReID

论文作者

Mekhazni, Djebril, Dufau, Maximilien, Desrosiers, Christian, Pedersoli, Marco, Granger, Eric

论文摘要

在大型全标记图像数据集中接受训练时，人重新识别系统（REID）可以实现高精度。但是，域移位通常与各种操作捕获条件（例如，相机的观点和照明）相关联可能会转化为性能的显着下降。本文重点介绍了基于视频的REID的无监督域适应性（UDA） - 一种相关场景，在文献中探讨了较少的情况。在这种情况下，REID模型必须适应基于轨迹信息的各种摄像机网络定义的复杂目标域。最先进的方法群集未标记的目标数据，但是跨目标摄像机的域移动（子域）会导致聚类方法的初始化差，从而传播了跨时期噪声的噪声，从而阻止了REID模型准确地关联相同身份的样本。在本文中，为视频人REID引入了一种UDA方法，该方法利用了视频曲目的知识，以及在目标摄像头上捕获的框架的分布，以提高使用伪标签训练的CNN骨架的性能。我们的方法依赖于一种对抗方法，在该方法中引入了相机歧视器网络以提取判别摄像机独立的表示形式，从而促进了随后的聚类。此外，提出了加权对比度损失，以利用簇的信心，并减轻身份不正确关联的风险。在三个具有挑战性的基于视频的人REID数据集-PRID2011，ILIDS-VID和MARS上获得的实验结果表明，我们提出的方法可以胜过相关的最先进方法。我们的代码可在：\ url {https://github.com/dmekhazni/cawcl-reid}中找到

Systems for person re-identification (ReID) can achieve a high accuracy when trained on large fully-labeled image datasets. However, the domain shift typically associated with diverse operational capture conditions (e.g., camera viewpoints and lighting) may translate to a significant decline in performance. This paper focuses on unsupervised domain adaptation (UDA) for video-based ReID - a relevant scenario that is less explored in the literature. In this scenario, the ReID model must adapt to a complex target domain defined by a network of diverse video cameras based on tracklet information. State-of-art methods cluster unlabeled target data, yet domain shifts across target cameras (sub-domains) can lead to poor initialization of clustering methods that propagates noise across epochs, thus preventing the ReID model to accurately associate samples of same identity. In this paper, an UDA method is introduced for video person ReID that leverages knowledge on video tracklets, and on the distribution of frames captured over target cameras to improve the performance of CNN backbones trained using pseudo-labels. Our method relies on an adversarial approach, where a camera-discriminator network is introduced to extract discriminant camera-independent representations, facilitating the subsequent clustering. In addition, a weighted contrastive loss is proposed to leverage the confidence of clusters, and mitigate the risk of incorrect identity associations. Experimental results obtained on three challenging video-based person ReID datasets - PRID2011, iLIDS-VID, and MARS - indicate that our proposed method can outperform related state-of-the-art methods. Our code is available at: \url{https://github.com/dmekhazni/CAWCL-ReID}

下载PDF全文

下载文献需遵守相关版权规定

论文标题