论文标题

提供追索权的基于归因的解释不能稳健

Attribution-based Explanations that Provide Recourse Cannot be Robust

论文作者

Fokkema, Hidde, de Heide, Rianne, van Erven, Tim

论文摘要

机器学习方法的不同用户需要不同的解释,具体取决于其目标。为了使机器学习对社会负责,一个重要的目标是获得可行的追索选项,这使受影响的用户可以通过对其输入$ x $进行有限的更改来更改机器学习系统的决策$ f(x)$。我们通过提供有关追索权敏感性的一般定义来对此进行正式化,该定义需要通过实用程序函数实例化,该函数描述了对决策的更改与用户相关的变化。此定义适用于本地归因方法,该方法将重要的权重归因于每个输入功能。通常认为,这种本地属性应该是强大的,因为在某种意义上,要解释的输入$ x $的小变化不应引起特征权重的大变化。但是,我们正式证明,任何单个归因方法通常都不可能同时既敏感又坚固。因此,必须总有反例来至少存在这些属性中的一个。我们为几种流行的归因方法提供了此类反例,包括石灰,外形,集成梯度和SmoothGrad。我们的结果还涵盖了反事实解释,可以将其视为描述$ x $的扰动的属性。我们进一步讨论了围绕我们的不可能结果工作的可能方法,例如,允许输出由具有多个属性的集合组成,我们为特定类别的连续功能提供足够的条件,以使其敏感。最后,我们通过提供不可能适用的函数$ f $的确切表征来更改$ x $的单个属性,从而加强了限制案例的不可能结果。

Different users of machine learning methods require different explanations, depending on their goals. To make machine learning accountable to society, one important goal is to get actionable options for recourse, which allow an affected user to change the decision $f(x)$ of a machine learning system by making limited changes to its input $x$. We formalize this by providing a general definition of recourse sensitivity, which needs to be instantiated with a utility function that describes which changes to the decisions are relevant to the user. This definition applies to local attribution methods, which attribute an importance weight to each input feature. It is often argued that such local attributions should be robust, in the sense that a small change in the input $x$ that is being explained, should not cause a large change in the feature weights. However, we prove formally that it is in general impossible for any single attribution method to be both recourse sensitive and robust at the same time. It follows that there must always exist counterexamples to at least one of these properties. We provide such counterexamples for several popular attribution methods, including LIME, SHAP, Integrated Gradients and SmoothGrad. Our results also cover counterfactual explanations, which may be viewed as attributions that describe a perturbation of $x$. We further discuss possible ways to work around our impossibility result, for instance by allowing the output to consist of sets with multiple attributions, and we provide sufficient conditions for specific classes of continuous functions to be recourse sensitive. Finally, we strengthen our impossibility result for the restricted case where users are only able to change a single attribute of $x$, by providing an exact characterization of the functions $f$ to which impossibility applies.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源