论文标题
生成模型驱动的结构对准偏置零射击学习的区分嵌入
Generative Model-driven Structure Aligning Discriminative Embeddings for Transductive Zero-shot Learning
论文作者
论文摘要
零击学习(ZSL)是一种转移学习技术,旨在将知识从可见的课程转移到看不见的课程。这种知识转移是可能的,因为在观察和看不见的阶级的基本语义空间中是可能的。大多数现有方法使用标记为可见的类数据来学习投影函数,这些数据将视觉数据映射到语义数据。在这项工作中,我们提出了一个学习这种投影函数的浅但有效的基于神经网络的模型,该模型可以使潜在空间中的视觉和语义数据保持一致,同时使潜在的空间嵌入歧视。由于上述投影函数是使用所见类数据学习的,因此存在所谓的投影域移位。我们提出了一种跨性方法来减少域移动的效果,在该方法中,我们利用来自看不见类的未标记的视觉数据来生成相应的语义特征,用于看不见的类型视觉样本。尽管这些语义特征最初是使用条件变分自动编码器生成的,但它们与可见的类数据一起使用以改善投影函数。我们在标准基准数据集AWA1,AWA2,Cub,Sun,Flo和APY上实验ZSL和广义ZSL的电感和偏置设置。在ZSL的背景下,不同数据集对不同数据集的标记数据,我们还显示了模型的功效。
Zero-shot Learning (ZSL) is a transfer learning technique which aims at transferring knowledge from seen classes to unseen classes. This knowledge transfer is possible because of underlying semantic space which is common to seen and unseen classes. Most existing approaches learn a projection function using labelled seen class data which maps visual data to semantic data. In this work, we propose a shallow but effective neural network-based model for learning such a projection function which aligns the visual and semantic data in the latent space while simultaneously making the latent space embeddings discriminative. As the above projection function is learned using the seen class data, the so-called projection domain shift exists. We propose a transductive approach to reduce the effect of domain shift, where we utilize unlabeled visual data from unseen classes to generate corresponding semantic features for unseen class visual samples. While these semantic features are initially generated using a conditional variational auto-encoder, they are used along with the seen class data to improve the projection function. We experiment on both inductive and transductive setting of ZSL and generalized ZSL and show superior performance on standard benchmark datasets AWA1, AWA2, CUB, SUN, FLO, and APY. We also show the efficacy of our model in the case of extremely less labelled data regime on different datasets in the context of ZSL.