论文标题
跨层注意力网络用于细粒度的视觉分类
Cross-layer Attention Network for Fine-grained Visual Categorization
论文作者
论文摘要
对微妙本地化细节的学习判别性表示在细粒度的视觉分类(FGVC)中起着重要作用。与以前的基于注意力的作品相比,我们的工作并未明确定义或定位感兴趣的部分区域;取而代之的是,我们利用网络不同阶段的互补特性,并在中层特征图和我们提议的跨层注意网络(CLAN)的顶级特征图之间建立相互的完善机制。具体而言,氏族由1)跨层环境关注(CLCA)模块组成,该模块在中间特征图的帮助下增强了中间特征图中的全球上下文信息,从而提高了中间层的表达能力,以及2)跨层的空间注意力(CLSA)模块,该模块在当地的特征中,该特征在中层中的特征映射,以使得在中层的特征映射,以使其在中间的特征映射,以使其在中间的特征映射到层次的范围,从而使组合的特征构成了型号的功能。顶级特征地图。实验结果表明,我们的方法可以在三个公开可用的细粒识别数据集(CUB-200-2011,Stanford Cars和FGVC-Aircraft)上实现最先进的方法。提供消融研究和可视化以了解我们的方法。实验结果表明,我们的方法可以在三个公开可用的细粒识别数据集(CUB-200-2011,Stanford Cars和FGVC-Aircraft)上实现最先进的方法。
Learning discriminative representations for subtle localized details plays a significant role in Fine-grained Visual Categorization (FGVC). Compared to previous attention-based works, our work does not explicitly define or localize the part regions of interest; instead, we leverage the complementary properties of different stages of the network, and build a mutual refinement mechanism between the mid-level feature maps and the top-level feature map by our proposed Cross-layer Attention Network (CLAN). Specifically, CLAN is composed of 1) the Cross-layer Context Attention (CLCA) module, which enhances the global context information in the intermediate feature maps with the help of the top-level feature map, thereby improving the expressive power of the middle layers, and 2) the Cross-layer Spatial Attention (CLSA) module, which takes advantage of the local attention in the mid-level feature maps to boost the feature extraction of local regions at the top-level feature maps. Experimental results show our approach achieves state-of-the-art on three publicly available fine-grained recognition datasets (CUB-200-2011, Stanford Cars and FGVC-Aircraft). Ablation studies and visualizations are provided to understand our approach. Experimental results show our approach achieves state-of-the-art on three publicly available fine-grained recognition datasets (CUB-200-2011, Stanford Cars and FGVC-Aircraft).