论文标题
基于混合的特征空间学习,用于几张图像分类
Mixture-based Feature Space Learning for Few-shot Image Classification
论文作者
论文摘要
我们介绍了基于混合物的特征空间学习(MIXTFSL),以在几乎没有弹出图像分类的情况下获得丰富而健壮的特征表示。先前的工作已经提出,通过依靠离线聚类算法来对每个基类建模,或用混合模型对其进行建模。相比之下,我们建议通过同时训练特征提取器并以在线方式学习混合模型参数来对基类建模。这导致了一个更丰富,更具歧视性的特征空间,可以用来对很少的样本进行新颖的例子进行分类。提出了两个主要阶段来训练MixTFSL模型。首先,使用两个损耗函数的组合学习了每个基类的多模式混合物和特征提取器参数。其次,通过领导者追随者学习过程逐步完善了最终的网络和混合模型,该过程将当前估计用作“目标”网络。该目标网络用于将实例分配到混合组件上,从而提高性能并稳定训练。在四个标准数据集和四个骨干上进行了广泛的实验,证明了我们的端到端特征空间学习方法的有效性。 Notably, we demonstrate that when we combine our robust representation with recent alignment-based approaches, we achieve new state-of-the-art results in the inductive setting, with an absolute accuracy for 5-shot classification of 82.45 on miniImageNet, 88.20 with tieredImageNet, and 60.70 in FC100 using the ResNet-12 backbone.
We introduce Mixture-based Feature Space Learning (MixtFSL) for obtaining a rich and robust feature representation in the context of few-shot image classification. Previous works have proposed to model each base class either with a single point or with a mixture model by relying on offline clustering algorithms. In contrast, we propose to model base classes with mixture models by simultaneously training the feature extractor and learning the mixture model parameters in an online manner. This results in a richer and more discriminative feature space which can be employed to classify novel examples from very few samples. Two main stages are proposed to train the MixtFSL model. First, the multimodal mixtures for each base class and the feature extractor parameters are learned using a combination of two loss functions. Second, the resulting network and mixture models are progressively refined through a leader-follower learning procedure, which uses the current estimate as a "target" network. This target network is used to make a consistent assignment of instances to mixture components, which increases performance and stabilizes training. The effectiveness of our end-to-end feature space learning approach is demonstrated with extensive experiments on four standard datasets and four backbones. Notably, we demonstrate that when we combine our robust representation with recent alignment-based approaches, we achieve new state-of-the-art results in the inductive setting, with an absolute accuracy for 5-shot classification of 82.45 on miniImageNet, 88.20 with tieredImageNet, and 60.70 in FC100 using the ResNet-12 backbone.