在监督分类中通过广义蒸馏理解可解释性

论文标题

在监督分类中通过广义蒸馏理解可解释性

Understanding Interpretability by generalized distillation in Supervised Classification

论文作者

Agarwal, Adit, Shukla, K. K., Kuijper, Arjan, Mukhopadhyay, Anirban

论文摘要

解释机器学习（ML）模型做出的决定的能力是鼓励在不同实际应用中的信任和可靠性的基本。最近的解释策略着重于人类对复杂ML模型基本决策机制的理解。但是，这些策略受到人类的主观偏见的限制。为了与这种人类偏见分离，我们提出了一种相对于其他ML模型定义的逐散性公式的解释。我们概括了使用信息理论的观点来量化可解释性的蒸馏技术，从而从解释性的定义中删除了地面真实性的作用。我们的工作定义了监督分类模型的熵，从而在PICES线性神经网络（PWLN）的熵上提供了界限，以及PWLN的解释性的第一个理论界限。我们在MNIST，时尚界和Stanford40数据集上评估了我们提出的框架，并在不同的监督分类场景中演示了所提出的理论框架的适用性。

The ability to interpret decisions taken by Machine Learning (ML) models is fundamental to encourage trust and reliability in different practical applications. Recent interpretation strategies focus on human understanding of the underlying decision mechanisms of the complex ML models. However, these strategies are restricted by the subjective biases of humans. To dissociate from such human biases, we propose an interpretation-by-distillation formulation that is defined relative to other ML models. We generalize the distillation technique for quantifying interpretability, using an information-theoretic perspective, removing the role of ground-truth from the definition of interpretability. Our work defines the entropy of supervised classification models, providing bounds on the entropy of Piece-Wise Linear Neural Networks (PWLNs), along with the first theoretical bounds on the interpretability of PWLNs. We evaluate our proposed framework on the MNIST, Fashion-MNIST and Stanford40 datasets and demonstrate the applicability of the proposed theoretical framework in different supervised classification scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题