论文标题
为深度神经网络做出强大的解释
Towards Robust Explanations for Deep Neural Networks
论文作者
论文摘要
解释方法阐明了黑盒分类器(例如深神经网络)的决策过程。但是,由于它们容易受到操纵的影响,因此它们的实用性可能会受到损害。通过这项工作,我们旨在增强解释的弹性。我们开发了一个统一的理论框架,用于在模型的最大可操作性上得出界限。基于这些理论见解,我们提出了三种不同的技术,以提高对操纵的鲁棒性:重量衰减,平滑激活功能并最大程度地减少网络的Hessian。我们的实验结果证实了这些方法的有效性。
Explanation methods shed light on the decision process of black-box classifiers such as deep neural networks. But their usefulness can be compromised because they are susceptible to manipulations. With this work, we aim to enhance the resilience of explanations. We develop a unified theoretical framework for deriving bounds on the maximal manipulability of a model. Based on these theoretical insights, we present three different techniques to boost robustness against manipulation: training with weight decay, smoothing activation functions, and minimizing the Hessian of the network. Our experimental results confirm the effectiveness of these approaches.