论文标题
RELATIF:通过相对影响确定解释性培训示例
RelatIF: Identifying Explanatory Training Examples via Relative Influence
论文作者
论文摘要
在这项工作中,我们专注于使用影响功能来确定相关培训示例,人们可能希望“解释”机器学习模型的预测。影响力功能的一个缺点是,被认为最“有影响力”的训练示例通常是离群值或错误的,这使它们的解释选择不佳。为了解决这一缺点,我们将全球影响与本地影响的作用分开。我们介绍了Relatif,这是一种通过优化目标来选择相关培训示例的新标准,以限制全球影响力。 Relatif考虑了一个解释性示例对其全球对模型的影响的预测的局部影响。在经验评估中,我们发现,与使用影响函数的发现相比,Relatif返回的示例更加直观。
In this work, we focus on the use of influence functions to identify relevant training examples that one might hope "explain" the predictions of a machine learning model. One shortcoming of influence functions is that the training examples deemed most "influential" are often outliers or mislabelled, making them poor choices for explanation. In order to address this shortcoming, we separate the role of global versus local influence. We introduce RelatIF, a new class of criteria for choosing relevant training examples by way of an optimization objective that places a constraint on global influence. RelatIF considers the local influence that an explanatory example has on a prediction relative to its global effects on the model. In empirical evaluations, we find that the examples returned by RelatIF are more intuitive when compared to those found using influence functions.