多模式的原型网络，用于少量学习

论文标题

多模式的原型网络，用于少量学习

Multimodal Prototypical Networks for Few-shot Learning

论文作者

Pahde, Frederik, Puscas, Mihai, Klein, Tassilo, Nabi, Moin

论文摘要

尽管为许多计算机视觉任务提供了出色的结果，但最先进的深度学习算法在低数据场景中灾难性挣扎。但是，如果存在其他模式的数据（例如文本），则可以弥补缺乏数据并改善分类结果。为了克服这些数据稀缺性，我们设计了一个跨模式特征生成框架，能够在几乎没有镜头的情况下丰富填充少的嵌入空间，从而利用辅助模式的数据。具体而言，我们训练一个生成模型，该模型将文本数据映射到视觉特征空间中，以获得更可靠的原型。这允许在培训期间从其他模式（例如文本）中利用数据，而测试时间的最终任务仍然与仅视觉数据进行分类。我们表明，在这种情况下，最近的邻居分类是一种可行的方法，并且在CUB-200和Oxford-102数据集上胜过最先进的单模式和多模式的少量学习方法。

Although providing exceptional results for many computer vision tasks, state-of-the-art deep learning algorithms catastrophically struggle in low data scenarios. However, if data in additional modalities exist (e.g. text) this can compensate for the lack of data and improve the classification results. To overcome this data scarcity, we design a cross-modal feature generation framework capable of enriching the low populated embedding space in few-shot scenarios, leveraging data from the auxiliary modality. Specifically, we train a generative model that maps text data into the visual feature space to obtain more reliable prototypes. This allows to exploit data from additional modalities (e.g. text) during training while the ultimate task at test time remains classification with exclusively visual data. We show that in such cases nearest neighbor classification is a viable approach and outperform state-of-the-art single-modal and multimodal few-shot learning methods on the CUB-200 and Oxford-102 datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题