论文标题
注意力驱动的动态图卷积网络,用于多标签图像识别
Attention-Driven Dynamic Graph Convolutional Network for Multi-Label Image Recognition
论文作者
论文摘要
最近的研究经常利用图形卷积网络(GCN)来建模标签依赖性,以提高多标签图像识别的识别精度。但是,通过计算训练数据的标签共发生可能性来构建图可能会降低模型的通用性,尤其是在测试图像中偶尔存在共发生对象时。我们的目标是消除这种偏见并增强学习特征的鲁棒性。为此,我们提出了一个注意力驱动的动态图卷积网络(ADD-GCN),以动态生成每个图像的特定图。 ADD-GCN采用动态图卷积网络(D-GCN)来建模由语义注意模块(SAM)生成的内容感知类别表示的关系。对公共多标签基准测试的广泛实验证明了我们方法的有效性,该方法在MS-Coco,VOC2007和VOC2012上分别达到了85.2%,96.0%和95.5%的地图,并且具有清晰的Margin的最终状态方法。所有代码均可在https://github.com/yejin0111/add-gcn上找到。
Recent studies often exploit Graph Convolutional Network (GCN) to model label dependencies to improve recognition accuracy for multi-label image recognition. However, constructing a graph by counting the label co-occurrence possibilities of the training data may degrade model generalizability, especially when there exist occasional co-occurrence objects in test images. Our goal is to eliminate such bias and enhance the robustness of the learnt features. To this end, we propose an Attention-Driven Dynamic Graph Convolutional Network (ADD-GCN) to dynamically generate a specific graph for each image. ADD-GCN adopts a Dynamic Graph Convolutional Network (D-GCN) to model the relation of content-aware category representations that are generated by a Semantic Attention Module (SAM). Extensive experiments on public multi-label benchmarks demonstrate the effectiveness of our method, which achieves mAPs of 85.2%, 96.0%, and 95.5% on MS-COCO, VOC2007, and VOC2012, respectively, and outperforms current state-of-the-art methods with a clear margin. All codes can be found at https://github.com/Yejin0111/ADD-GCN.