论文标题
基于概括的显着图基于模型结果的解释
A generalizable saliency map-based interpretation of model outcome
论文作者
论文摘要
深神经网络的重大挑战之一是,网络的复杂本质阻止了人类对网络结果的理解。因此,复杂的机器学习模型的适用性受到安全关键领域的限制,这会影响生命和财产的风险。为了充分利用复杂神经网络的功能,我们提出了一种非侵入性的可解释性技术,该技术使用模型的输入和输出来生成显着图。该方法通过经验优化随机初始化的输入掩码,通过根据对目标类别的敏感性定位和称量单个像素来优化随机初始化的输入掩码。我们的实验表明,提出的模型可解释性方法的性能要比现有的基于显着图的方法更好地定位相关输入像素。 此外,为了获得有关特定于目标的解释的全局视角,我们提出了一种显着性图重建方法,以从输入数据分布的空间中生成可接受的显着输入变化,该变化尚未改变该模型结果。实验表明,我们的可解释性方法可以以89%的分类精度重建输入的显着部分。
One of the significant challenges of deep neural networks is that the complex nature of the network prevents human comprehension of the outcome of the network. Consequently, the applicability of complex machine learning models is limited in the safety-critical domains, which incurs risk to life and property. To fully exploit the capabilities of complex neural networks, we propose a non-intrusive interpretability technique that uses the input and output of the model to generate a saliency map. The method works by empirically optimizing a randomly initialized input mask by localizing and weighing individual pixels according to their sensitivity towards the target class. Our experiments show that the proposed model interpretability approach performs better than the existing saliency map-based approaches methods at localizing the relevant input pixels. Furthermore, to obtain a global perspective on the target-specific explanation, we propose a saliency map reconstruction approach to generate acceptable variations of the salient inputs from the space of input data distribution for which the model outcome remains unaltered. Experiments show that our interpretability method can reconstruct the salient part of the input with a classification accuracy of 89%.