对图像分类任务中AI-CNN的解释方法的评估

论文标题

对图像分类任务中AI-CNN的解释方法的评估

Evaluation of Explanation Methods of AI -- CNNs in Image Classification Tasks with Reference-based and No-reference Metrics

论文作者

Zhukov, A., Benois-Pineau, J., Giot, R.

论文摘要

AI-Machine学习范式中最流行的方法主要是黑匣子。这就是为什么对AI决定的解释是紧急情况的原因。尽管专门的解释工具已经大量开发，但对其质量的评估仍然是一个开放的研究问题。在本文中，我们概括了通过参考和基于参考的指标评估CNNS在视觉分类任务中的事后解释者评估的方法论。我们将它们应用于先前开发的解释器（FEM，MLFEM）和流行的毕业-CAM上。基于参考的指标是Pearson相关系数和在解释图之间计算出的相似性及其地面真实的相似性，该真相是通过心理视觉实验获得的凝视固定密度图表示的。作为无引用度量，我们使用Alvarez-Melis和Jaakkola提出的稳定度量。我们研究其行为，与基于参考的指标达成共识，并表明在输入图像上有几种降解的情况下，该指标与基于参考的指标一致。因此，当无法获得地面真相时，它可用于评估解释器的质量。

The most popular methods in AI-machine learning paradigm are mainly black boxes. This is why explanation of AI decisions is of emergency. Although dedicated explanation tools have been massively developed, the evaluation of their quality remains an open research question. In this paper, we generalize the methodologies of evaluation of post-hoc explainers of CNNs' decisions in visual classification tasks with reference and no-reference based metrics. We apply them on our previously developed explainers (FEM, MLFEM), and popular Grad-CAM. The reference-based metrics are Pearson correlation coefficient and Similarity computed between the explanation map and its ground truth represented by a Gaze Fixation Density Map obtained with a psycho-visual experiment. As a no-reference metric, we use stability metric, proposed by Alvarez-Melis and Jaakkola. We study its behaviour, consensus with reference-based metrics and show that in case of several kinds of degradation on input images, this metric is in agreement with reference-based ones. Therefore, it can be used for evaluation of the quality of explainers when the ground truth is not available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题