论文标题
语义杂交,以进行几次学习
Semantic Cross Attention for Few-shot Learning
论文作者
论文摘要
几乎没有射击学习(FSL)最近引起了很大的关注。在现有方法中,基于公制的方法旨在训练可以使类似样品关闭的嵌入网络,而尽可能多地样品并实现有希望的结果。 FSL的特征是仅使用几个图像来训练一个可以推广到图像分类问题的新颖类的模型,但是这种设置使得很难学习可以识别图像的外观变化的视觉特征。模型训练可能会朝错误的方向移动,因为相同语义类中的图像可能具有不同的外观,而不同语义类别中的图像可能具有相似的外观。我们认为,FSL可以从其他语义功能中受益,以学习判别特征表示。因此,本研究提出了一种多任务学习方法,以将标签文本的语义特征视为一项辅助任务,以帮助提高FSL任务的性能。我们提出的模型使用单词插入表示作为语义特征,以帮助训练嵌入网络和语义跨科模块,以将语义特征桥接到典型的视觉模态中。提出的方法很简单,但产生了出色的结果。我们将提出的方法应用于以前的两种基于度量的FSL方法,所有这些方法都可以大大提高性能。我们的模型的源代码可从GitHub访问。
Few-shot learning (FSL) has attracted considerable attention recently. Among existing approaches, the metric-based method aims to train an embedding network that can make similar samples close while dissimilar samples as far as possible and achieves promising results. FSL is characterized by using only a few images to train a model that can generalize to novel classes in image classification problems, but this setting makes it difficult to learn the visual features that can identify the images' appearance variations. The model training is likely to move in the wrong direction, as the images in an identical semantic class may have dissimilar appearances, whereas the images in different semantic classes may share a similar appearance. We argue that FSL can benefit from additional semantic features to learn discriminative feature representations. Thus, this study proposes a multi-task learning approach to view semantic features of label text as an auxiliary task to help boost the performance of the FSL task. Our proposed model uses word-embedding representations as semantic features to help train the embedding network and a semantic cross-attention module to bridge the semantic features into the typical visual modal. The proposed approach is simple, but produces excellent results. We apply our proposed approach to two previous metric-based FSL methods, all of which can substantially improve performance. The source code for our model is accessible from github.