MACE：用于解释图像分类网络的模型不可知论概念提取器

论文标题

MACE：用于解释图像分类网络的模型不可知论概念提取器

MACE: Model Agnostic Concept Extractor for Explaining Image Classification Networks

论文作者

Kumar, Ashish, Sehgal, Karan, Garg, Prerna, Kamakshi, Vidhya, Krishnan, Narayanan C

论文摘要

深度卷积网络在各种图像分类任务上都非常成功。解释预训练模型的预测的当前方法依赖于梯度信息，通常会导致显着图集中在整个前景对象上。但是，人类通常是通过解剖图像并指出较小概念的存在来推理。最终输出通常是这些较小概念的存在或不存在的聚集。在这项工作中，我们提出了MACE：一种模型不可知的概念提取器，可以通过较小的概念来解释卷积网络的工作。 MACE框架剖析了由卷积网络生成的特征图，用于提取基于概念的原型解释。此外，它估计了提取的概念与预培训模型的预测的相关性，这是解释各个班级预测所需的关键方面，这是现有方法中缺少的。我们使用VGG16和RESNET50 CNN体系结构以及具有属性2（AWA2）和Placs365的动物等数据集验证我们的框架。我们的实验表明，通过MACE框架提取的概念提高了解释的人类解释性，并忠于潜在的预训练的黑盒模型。

Deep convolutional networks have been quite successful at various image classification tasks. The current methods to explain the predictions of a pre-trained model rely on gradient information, often resulting in saliency maps that focus on the foreground object as a whole. However, humans typically reason by dissecting an image and pointing out the presence of smaller concepts. The final output is often an aggregation of the presence or absence of these smaller concepts. In this work, we propose MACE: a Model Agnostic Concept Extractor, which can explain the working of a convolutional network through smaller concepts. The MACE framework dissects the feature maps generated by a convolution network for an image to extract concept based prototypical explanations. Further, it estimates the relevance of the extracted concepts to the pre-trained model's predictions, a critical aspect required for explaining the individual class predictions, missing in existing approaches. We validate our framework using VGG16 and ResNet50 CNN architectures, and on datasets like Animals With Attributes 2 (AWA2) and Places365. Our experiments demonstrate that the concepts extracted by the MACE framework increase the human interpretability of the explanations, and are faithful to the underlying pre-trained black-box model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题