论文标题

可靠的事后解释:解释性的不确定性建模

Reliable Post hoc Explanations: Modeling Uncertainty in Explainability

论文作者

Slack, Dylan, Hilgard, Sophie, Singh, Sameer, Lakkaraju, Himabindu

论文摘要

由于黑匣子的解释越来越多地被用来在高风险设置中建立模型信誉,因此必须确保这些解释是准确可靠的。但是,先前的工作表明,最新技术产生的解释是不一致,不稳定的,并且几乎没有对其正确性和可靠性的见解。另外,这些方法在计算上也效率低下,需要明显的高参数调整。在本文中,我们通过开发一个新颖的贝叶斯框架来解决上述挑战,以产生本地解释及其相关的不确定性。我们实例化此框架以获取贝叶斯版本的石灰和内核变形,这些版本以功能重要性输出可靠的间隔,从而捕获相关的不确定性。由此产生的解释不仅使我们能够对其质量进行具体的推论(例如,该特征重要性在给定范围内有95%的可能性),而且还高度一致且稳定。我们进行了详细的理论分析,该分析利用上述不确定性来估计要采样多少扰动,以及如何采样更快的收敛。这项工作首次尝试以流行的解释方法来解决几个关键问题,从而以计算有效的方式产生一致,稳定且可靠的解释,并保证保证。使用多个现实世界数据集和用户研究的实验评估表明,所提出的框架的功效。

As black box explanations are increasingly being employed to establish model credibility in high-stakes settings, it is important to ensure that these explanations are accurate and reliable. However, prior work demonstrates that explanations generated by state-of-the-art techniques are inconsistent, unstable, and provide very little insight into their correctness and reliability. In addition, these methods are also computationally inefficient, and require significant hyper-parameter tuning. In this paper, we address the aforementioned challenges by developing a novel Bayesian framework for generating local explanations along with their associated uncertainty. We instantiate this framework to obtain Bayesian versions of LIME and KernelSHAP which output credible intervals for the feature importances, capturing the associated uncertainty. The resulting explanations not only enable us to make concrete inferences about their quality (e.g., there is a 95% chance that the feature importance lies within the given range), but are also highly consistent and stable. We carry out a detailed theoretical analysis that leverages the aforementioned uncertainty to estimate how many perturbations to sample, and how to sample for faster convergence. This work makes the first attempt at addressing several critical issues with popular explanation methods in one shot, thereby generating consistent, stable, and reliable explanations with guarantees in a computationally efficient manner. Experimental evaluation with multiple real world datasets and user studies demonstrate that the efficacy of the proposed framework.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源