论文标题

解释神经网络,无需访问培训数据

Explaining Neural Networks without Access to Training Data

论文作者

Marton, Sascha, Lüdtke, Stefan, Bartelt, Christian, Tschalzev, Andrej, Stuckenschmidt, Heiner

论文摘要

我们考虑在无法访问网络培训数据(例如由于隐私或安全问题)的情况下为神经网络生成解释。最近,已经提出了$ \ Mathcal {i} $ - 网络是一种无样品后全球模型可解释性的方法,不需要访问培训数据。他们将解释作为一项机器学习任务,将网络表示(参数)映射到可解释功能的表示。在本文中,我们将$ \ Mathcal {i} $ - 网络框架扩展到标准和软决策树作为替代模型的情况。我们提出了相应的$ \ Mathcal {i} $ - 净输出层的合适决策树表示和设计。此外,我们通过考虑在生成$ \ Mathcal {i} $ - Net的培训数据时考虑更现实的分布来制作适用于现实世界任务的NETS $ \ MATHCAL {I} $ - NETS。我们通过经验评估我们的方法对传统的全球,事后解释性方法,并表明当无法访问培训数据时,它可以取得优越的结果。

We consider generating explanations for neural networks in cases where the network's training data is not accessible, for instance due to privacy or safety issues. Recently, $\mathcal{I}$-Nets have been proposed as a sample-free approach to post-hoc, global model interpretability that does not require access to training data. They formulate interpretation as a machine learning task that maps network representations (parameters) to a representation of an interpretable function. In this paper, we extend the $\mathcal{I}$-Net framework to the cases of standard and soft decision trees as surrogate models. We propose a suitable decision tree representation and design of the corresponding $\mathcal{I}$-Net output layers. Furthermore, we make $\mathcal{I}$-Nets applicable to real-world tasks by considering more realistic distributions when generating the $\mathcal{I}$-Net's training data. We empirically evaluate our approach against traditional global, post-hoc interpretability approaches and show that it achieves superior results when the training data is not accessible.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源