论文标题
是什么做出了很好的解释?:对解释属性的统一观点
What Makes a Good Explanation?: A Harmonized View of Properties of Explanations
论文作者
论文摘要
可解释性为人类提供了一种验证机器学习(ML)模型方面并在无法完全自动化的情况下授权人+ML组合的方法。不同的上下文需要具有不同属性的解释。例如,确定早期心脏骤停警告系统是否准备将其整合到护理环境中所需的解释与贷款申请人所需的解释类型非常不同,以帮助确定他们可能需要采取的措施才能使申请成功。 不幸的是,在解释的属性方面缺乏标准化:不同的论文可能会使用相同的术语来表示不同数量,而不同的术语表示相同数量。缺乏标准化的术语和ML解释属性的分类使我们无法严格比较可解释的机器学习方法,并确定在哪些情况下需要哪些属性。 在这项工作中,我们调查了在可解释的机器学习论文中定义的属性,根据它们实际测量的内容合成它们,并描述这些属性不同配方之间的权衡。在此过程中,我们启用了更明智的解释属性配方的知识选择,以及在可解释的机器学习中的未来工作的标准化。
Interpretability provides a means for humans to verify aspects of machine learning (ML) models and empower human+ML teaming in situations where the task cannot be fully automated. Different contexts require explanations with different properties. For example, the kind of explanation required to determine if an early cardiac arrest warning system is ready to be integrated into a care setting is very different from the type of explanation required for a loan applicant to help determine the actions they might need to take to make their application successful. Unfortunately, there is a lack of standardization when it comes to properties of explanations: different papers may use the same term to mean different quantities, and different terms to mean the same quantity. This lack of a standardized terminology and categorization of the properties of ML explanations prevents us from both rigorously comparing interpretable machine learning methods and identifying what properties are needed in what contexts. In this work, we survey properties defined in interpretable machine learning papers, synthesize them based on what they actually measure, and describe the trade-offs between different formulations of these properties. In doing so, we enable more informed selection of task-appropriate formulations of explanation properties as well as standardization for future work in interpretable machine learning.