自我解释结构改善了NLP模型

论文标题

自我解释结构改善了NLP模型

Self-Explaining Structures Improve NLP Models

论文作者

Sun, Zijun, Fan, Chun, Han, Qinghong, Sun, Xiaofei, Meng, Yuxian, Wu, Fei, Li, Jiwei

论文摘要

在NLP中解释深度学习模型的现有方法通常遭受两个主要缺点：（1）主要模型和解释模型被解耦：使用其他探测或替代模型来解释现有模型，因此现有的解释工具是不可自信的；（2）探测模型只能通过计算单个单词的显着性得分来通过在低级特征上操作来解释模型的预测，但在高级文本单元（例如短语，句子或段落）上笨拙。为了解决这两个问题，在本文中，我们为NLP中的深度学习模型提出了一个简单但一般而有效的自我解释框架。所提出的框架的关键点是在任何现有的NLP模型的基础上放置一个额外的层，如解释层所示。该层汇总了每个文本跨度的信息，然后将其与特定的权重关联，并将其加权组合馈送到最终预测的SoftMax函数。提出的模型具有以下优点：（1）跨度权重使模型可以自我解释，并且不需要额外的探测模型来解释；（2）所提出的模型是一般的，可以适应NLP中任何现有的深度学习结构；（3）与每个文本跨度相关的权重为高级文本单元（例如短语和句子）提供了直接的重要性分数。我们首次表明，可解释性并非以绩效为代价：自我解释功能的神经模型比没有自我解释性质的对应能力更好，在SST-5上实现了59.1的新SOTA性能，而SNLI上的新SOTA表现为92.3。

Existing approaches to explaining deep learning models in NLP usually suffer from two major drawbacks: (1) the main model and the explaining model are decoupled: an additional probing or surrogate model is used to interpret an existing model, and thus existing explaining tools are not self-explainable; (2) the probing model is only able to explain a model's predictions by operating on low-level features by computing saliency scores for individual words but are clumsy at high-level text units such as phrases, sentences, or paragraphs. To deal with these two issues, in this paper, we propose a simple yet general and effective self-explaining framework for deep learning models in NLP. The key point of the proposed framework is to put an additional layer, as is called by the interpretation layer, on top of any existing NLP model. This layer aggregates the information for each text span, which is then associated with a specific weight, and their weighted combination is fed to the softmax function for the final prediction. The proposed model comes with the following merits: (1) span weights make the model self-explainable and do not require an additional probing model for interpretation; (2) the proposed model is general and can be adapted to any existing deep learning structures in NLP; (3) the weight associated with each text span provides direct importance scores for higher-level text units such as phrases and sentences. We for the first time show that interpretability does not come at the cost of performance: a neural model of self-explaining features obtains better performances than its counterpart without the self-explaining nature, achieving a new SOTA performance of 59.1 on SST-5 and a new SOTA performance of 92.3 on SNLI.

下载PDF全文

下载文献需遵守相关版权规定

论文标题