将语言指南整合到基于视觉的深度度量学习中

论文标题

将语言指南整合到基于视觉的深度度量学习中

Integrating Language Guidance into Vision-based Deep Metric Learning

论文作者

Roth, Karsten, Vinyals, Oriol, Akata, Zeynep

论文摘要

深度度量学习（DML）建议学习编码语义相似性作为嵌入空间距离的度量空间。这些空间应可以转移到培训期间所见的课程中。通常，DML方法任务网络可以解决对二进制类分配定义的对比度排名任务。但是，这种方法忽略了实际类之间的高级语义关系。这会导致学习的嵌入空间编码不完整的语义上下文，并歪曲了类之间的语义关系，从而影响了学到的度量空间的普遍性。为了解决这个问题，我们为视觉相似性学习提出了一个语言指导目标。利用专家和伪classnames的语言嵌入，我们将视觉表示和重新调整为对应于有意义的语言语义的空间，以提供更好的语义一致性。广泛的实验和消融为我们提出的方法提供了强大的动力，并展示了语言指导，为DML提供了重大的模型不合时宜的改进，从而在所有基准测试中实现了竞争性和最先进的结果。可在https://github.com/explainableml/languageguidance_for_dml上找到代码。

Deep Metric Learning (DML) proposes to learn metric spaces which encode semantic similarities as embedding space distances. These spaces should be transferable to classes beyond those seen during training. Commonly, DML methods task networks to solve contrastive ranking tasks defined over binary class assignments. However, such approaches ignore higher-level semantic relations between the actual classes. This causes learned embedding spaces to encode incomplete semantic context and misrepresent the semantic relation between classes, impacting the generalizability of the learned metric space. To tackle this issue, we propose a language guidance objective for visual similarity learning. Leveraging language embeddings of expert- and pseudo-classnames, we contextualize and realign visual representation spaces corresponding to meaningful language semantics for better semantic consistency. Extensive experiments and ablations provide a strong motivation for our proposed approach and show language guidance offering significant, model-agnostic improvements for DML, achieving competitive and state-of-the-art results on all benchmarks. Code available at https://github.com/ExplainableML/LanguageGuidance_for_DML.

下载PDF全文

下载文献需遵守相关版权规定

论文标题