论文标题
归因于视觉相似性学习
Attributable Visual Similarity Learning
论文作者
论文摘要
本文提出了一个可归因的视觉相似性学习(AVSL)框架,以在图像之间进行更准确,可解释的相似性度量。大多数现有的相似性学习方法通过将每个样本映射到具有距离度量的嵌入式空间中的一个点(例如,马哈拉诺比看距离,欧几里得距离),从而加剧了无法解释的。由人类语义相似性认知的动机,我们提出了一个广义的相似性学习范式,以表示两个图像与图形的相似性,然后相应地推断总体相似性。此外,我们建立了自下而上的相似性构建和自上而下的相似性推理框架,以推断基于语义层次结构一致性的相似性。我们首先识别不可靠的高级相似性节点,然后使用最连贯的相邻相似性节点对其进行纠正,从而同时保留痕迹以获得相似性归因。 CUB-200-2011,CARS196和Stanford Online Products数据集的广泛实验表明,对现有的深层相似性学习方法有了重大改进,并验证了我们框架的解释性。代码可在https://github.com/zbr17/avsl上找到。
This paper proposes an attributable visual similarity learning (AVSL) framework for a more accurate and explainable similarity measure between images. Most existing similarity learning methods exacerbate the unexplainability by mapping each sample to a single point in the embedding space with a distance metric (e.g., Mahalanobis distance, Euclidean distance). Motivated by the human semantic similarity cognition, we propose a generalized similarity learning paradigm to represent the similarity between two images with a graph and then infer the overall similarity accordingly. Furthermore, we establish a bottom-up similarity construction and top-down similarity inference framework to infer the similarity based on semantic hierarchy consistency. We first identify unreliable higher-level similarity nodes and then correct them using the most coherent adjacent lower-level similarity nodes, which simultaneously preserve traces for similarity attribution. Extensive experiments on the CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate significant improvements over existing deep similarity learning methods and verify the interpretability of our framework. Code is available at https://github.com/zbr17/AVSL.