VL-Taboo：对视觉模型的基于属性的零击功能的分析

论文标题

VL-Taboo：对视觉模型的基于属性的零击功能的分析

VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models

论文作者

Vogel, Felix, Shvetsova, Nina, Karlinsky, Leonid, Kuehne, Hilde

论文摘要

自从出现以来，在大型，随机收集的数据上接受培训的视觉模型在许多领域都有重大影响。但是，由于它们在各个领域（例如图像文本 - 取回）表现出色时，他们的内部工作仍未得到充分了解。当前的工作分析了这些模型的真实零拍功能。我们从对培训语料库的分析开始，评估测试类的程度（以及哪个）实际上是零射击，以及与单个类别的性能相关联。我们跟进这些模型的基于属性的零击学习能力的分析，并评估了这种经典的零击概念从大规模的监督中出现的方式。我们利用最近发布的LAION400M数据语料库以及公开可用的剪辑，OpenClip和Flava的模型，评估了基于属性的CUB和AWA2基准的零摄像机功能。我们的分析表明：（i）在预训练期间（很多）观察到大多数流行零射基测试基准中的大多数类；（ii）零射击性能主要来自模型识别类标签的能力，每当它们存在于文本中时，并且只有在不使用类标签时才能观察到基于属性的Zeroshot学习的较低的性能能力；（iii）所使用的属性数量可能会对性能产生重大影响，并且很容易导致性能大幅下降。

Vision-language models trained on large, randomly collected data had significant impact in many areas since they appeared. But as they show great performance in various fields, such as image-text-retrieval, their inner workings are still not fully understood. The current work analyses the true zero-shot capabilities of those models. We start from the analysis of the training corpus assessing to what extent (and which of) the test classes are really zero-shot and how this correlates with individual classes performance. We follow up with the analysis of the attribute-based zero-shot learning capabilities of these models, evaluating how well this classical zero-shot notion emerges from large-scale webly supervision. We leverage the recently released LAION400M data corpus as well as the publicly available pretrained models of CLIP, OpenCLIP, and FLAVA, evaluating the attribute-based zero-shot capabilities on CUB and AWA2 benchmarks. Our analysis shows that: (i) most of the classes in popular zero-shot benchmarks are observed (a lot) during pre-training; (ii) zero-shot performance mainly comes out of models' capability of recognizing class labels, whenever they are present in the text, and a significantly lower performing capability of attribute-based zeroshot learning is only observed when class labels are not used; (iii) the number of the attributes used can have a significant effect on performance, and can easily cause a significant performance decrease.

下载PDF全文

下载文献需遵守相关版权规定

论文标题