用Grad-CAM概括对抗性解释

论文标题

用Grad-CAM概括对抗性解释

Generalizing Adversarial Explanations with Grad-CAM

论文作者

Chakraborty, Tanmay, Trehan, Utkarsh, Mallat, Khawla, Dugelay, Jean-Luc

论文摘要

梯度加权类激活映射（GRAD-CAM）是一种基于示例的解释方法，它提供了梯度激活热图作为卷积神经网络（CNN）模型的解释。该方法的缺点是它不能用于概括CNN行为。在本文中，我们提出了一种新颖的方法，该方法将Grad-CAM从基于示例的解释扩展到解释全球模型行为的方法。这是通过引入两个新的指标来实现的，（i）平均观察到的差异（mod）和（ii）差异（vid）的变化，以进行模型概括。这些指标是通过比较原始测试集的样品的Grad-CAM生成的热图的归一化倒置结构相似性指数（NISSIM）度量以及对对抗性测试集的样品的计算。在我们的实验中，我们使用快速梯度符号方法（FGSM）研究了对诸如VGG16，RESNET50和RESNET101的深层模型的对抗性攻击，以及诸如InceptionNetv3和XceptionNet之类的广泛模型。然后，我们在VGGFACE2数据集中计算“指标” mod和VID，以进行自动面部识别（AFR）用例。我们观察到在Grad-CAM Heatmap中突出显示的区域的一致转变，反映了其参与对抗攻击下所有模型的决策。所提出的方法可用于了解对抗性攻击，并解释黑匣子CNN模型用于图像分析的行为。

Gradient-weighted Class Activation Mapping (Grad- CAM), is an example-based explanation method that provides a gradient activation heat map as an explanation for Convolution Neural Network (CNN) models. The drawback of this method is that it cannot be used to generalize CNN behaviour. In this paper, we present a novel method that extends Grad-CAM from example-based explanations to a method for explaining global model behaviour. This is achieved by introducing two new metrics, (i) Mean Observed Dissimilarity (MOD) and (ii) Variation in Dissimilarity (VID), for model generalization. These metrics are computed by comparing a Normalized Inverted Structural Similarity Index (NISSIM) metric of the Grad-CAM generated heatmap for samples from the original test set and samples from the adversarial test set. For our experiment, we study adversarial attacks on deep models such as VGG16, ResNet50, and ResNet101, and wide models such as InceptionNetv3 and XceptionNet using Fast Gradient Sign Method (FGSM). We then compute the metrics MOD and VID for the automatic face recognition (AFR) use case with the VGGFace2 dataset. We observe a consistent shift in the region highlighted in the Grad-CAM heatmap, reflecting its participation to the decision making, across all models under adversarial attacks. The proposed method can be used to understand adversarial attacks and explain the behaviour of black box CNN models for image analysis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题