解释游戏：通过稀疏交流来预测可解释性

论文标题

解释游戏：通过稀疏交流来预测可解释性

The Explanation Game: Towards Prediction Explainability through Sparse Communication

论文作者

Treviso, Marcos V., Martins, André F. T.

论文摘要

解释性是在NLP中越来越重要的话题。在这项工作中，我们提供了解释性的统一观点，作为解释者与外行人之间关于分类器的决定的交流问题。我们使用此框架来比较一些先前的方法来提取解释，包括梯度方法，表示形式擦除和注意机制，从其沟通成功来看。此外，我们根据经典特征选择重新解释了这些方法，并以此为灵感来提出新的嵌入方法，以通过使用选择性，稀疏的注意力来解释性。使用解释器和外行的不同配置（包括机器和人类）的不同配置，文本分类，自然语言的努力和机器翻译实验揭示了基于注意力的解释器比梯度和擦除方法的优势。此外，人类评估实验显示了有希望的结果，并通过培训以优化沟通成功和忠诚的培训。

Explainability is a topic of growing importance in NLP. In this work, we provide a unified perspective of explainability as a communication problem between an explainer and a layperson about a classifier's decision. We use this framework to compare several prior approaches for extracting explanations, including gradient methods, representation erasure, and attention mechanisms, in terms of their communication success. In addition, we reinterpret these methods at the light of classical feature selection, and we use this as inspiration to propose new embedded methods for explainability, through the use of selective, sparse attention. Experiments in text classification, natural language entailment, and machine translation, using different configurations of explainers and laypeople (including both machines and humans), reveal an advantage of attention-based explainers over gradient and erasure methods. Furthermore, human evaluation experiments show promising results with post-hoc explainers trained to optimize communication success and faithfulness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题