大规模图像本地化的自我探索的细粒区域相似性

论文标题

大规模图像本地化的自我探索的细粒区域相似性

Self-supervising Fine-grained Region Similarities for Large-scale Image Localization

论文作者

Ge, Yixiao, Wang, Haibo, Zhu, Feng, Zhao, Rui, Li, Hongsheng

论文摘要

大规模检索图像本地化的任务是通过从城市规模的数据集中识别其最近的参考图像来估算查询图像的地理位置。但是，公共公共基准仅提供与培训图像相关的嘈杂的GPS标签，这些标签是学习图像到图像相似性的薄弱监督。这样的标签噪声阻止了深层神经网络学习判别特征以进行准确的定位。为了应对这一挑战，我们建议自我避难所的形象与区域相似性，以便充分探索其子区域旁边困难的积极图像的潜力。估计的图像对区域相似性可以作为几代人的额外培训监督，以改善网络，这又可以逐渐完善细粒度的相似性以实现最佳性能。我们提出的自我增强图像到区域相似性标签有效地处理了最先进的管道中的训练瓶颈，而在培训和推理中没有任何其他参数或手动注释。我们的方法在标准定位基准测试基准上的最先进的边缘通过明显的边距优于最先进的方法，并且在多个图像检索数据集上显示出极好的概括能力。

The task of large-scale retrieval-based image localization is to estimate the geographical location of a query image by recognizing its nearest reference images from a city-scale dataset. However, the general public benchmarks only provide noisy GPS labels associated with the training images, which act as weak supervisions for learning image-to-image similarities. Such label noise prevents deep neural networks from learning discriminative features for accurate localization. To tackle this challenge, we propose to self-supervise image-to-region similarities in order to fully explore the potential of difficult positive images alongside their sub-regions. The estimated image-to-region similarities can serve as extra training supervision for improving the network in generations, which could in turn gradually refine the fine-grained similarities to achieve optimal performance. Our proposed self-enhanced image-to-region similarity labels effectively deal with the training bottleneck in the state-of-the-art pipelines without any additional parameters or manual annotations in both training and inference. Our method outperforms state-of-the-arts on the standard localization benchmarks by noticeable margins and shows excellent generalization capability on multiple image retrieval datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题