论文标题
从最佳学习中学习:通过对抗信息校准合理化预测
Learning from the Best: Rationalizing Prediction by Adversarial Information Calibration
论文作者
论文摘要
解释AI模型的预测至关重要,例如在法律或医疗领域中。预测的一种解释形式是一种提取的基本原理,即实例的特征子集,该实例引导模型对实例进行预测。以前的生成提取理由的工作通常采用两相模型:选择最重要的特征(即理由)的选择器,然后是一个预测变量,该预测指标仅基于所选特征。这些作品的一个缺点是,学习选择功能的主要信号来自预测变量和基础真实答案的答案的比较。在这项工作中,我们建议通过信息校准方法从预测变量中汲取更多信息。更确切地说,我们共同训练两个模型:一个是一种典型的神经模型,以准确但黑色的盒子方式解决了手头的任务,而另一种是选择器预测模型,该模型还为其预测产生了基本原理。第一个模型用作第二个模型的指南。我们使用一种基于对抗性的技术来校准由两个模型提取的信息,以便它们之间的区别是错过或过度选择的特征的指标。此外,对于自然语言任务,我们建议使用基于语言模型的正规器来鼓励提取流利的理由。关于情感分析任务以及法律领域的三个任务的实验结果表明了我们采取理由提取方法的有效性。
Explaining the predictions of AI models is paramount in safety-critical applications, such as in legal or medical domains. One form of explanation for a prediction is an extractive rationale, i.e., a subset of features of an instance that lead the model to give its prediction on the instance. Previous works on generating extractive rationales usually employ a two-phase model: a selector that selects the most important features (i.e., the rationale) followed by a predictor that makes the prediction based exclusively on the selected features. One disadvantage of these works is that the main signal for learning to select features comes from the comparison of the answers given by the predictor and the ground-truth answers. In this work, we propose to squeeze more information from the predictor via an information calibration method. More precisely, we train two models jointly: one is a typical neural model that solves the task at hand in an accurate but black-box manner, and the other is a selector-predictor model that additionally produces a rationale for its prediction. The first model is used as a guide to the second model. We use an adversarial-based technique to calibrate the information extracted by the two models such that the difference between them is an indicator of the missed or over-selected features. In addition, for natural language tasks, we propose to use a language-model-based regularizer to encourage the extraction of fluent rationales. Experimental results on a sentiment analysis task as well as on three tasks from the legal domain show the effectiveness of our approach to rationale extraction.