论文标题
通过回顾模型解释来解释预测不确定性
Explaining Predictive Uncertainty by Looking Back at Model Explanations
论文作者
论文摘要
预测性语言模型的预测不确定性估计是人们可以信任自己的预测的重要衡量标准。但是,对于导致模型预测的原因知之甚少。解释预测不确定性是解释预测标签在帮助用户理解模型决策并获得对模型预测的信任时的重要补充,而在先前的工作中很大程度上被忽略了。在这项工作中,我们建议通过从现有模型解释中提取不确定单词来解释预训练的语言模型的预测不确定性。我们发现不确定的词是被确定为对预测标签做出负面贡献的单词,同时实际上解释了预测性不确定性。实验表明,不确定性解释对于解释模型并帮助人类理解模型预测行为是必不可少的。
Predictive uncertainty estimation of pre-trained language models is an important measure of how likely people can trust their predictions. However, little is known about what makes a model prediction uncertain. Explaining predictive uncertainty is an important complement to explaining prediction labels in helping users understand model decision making and gaining their trust on model predictions, while has been largely ignored in prior works. In this work, we propose to explain the predictive uncertainty of pre-trained language models by extracting uncertain words from existing model explanations. We find the uncertain words are those identified as making negative contributions to prediction labels, while actually explaining the predictive uncertainty. Experiments show that uncertainty explanations are indispensable to explaining models and helping humans understand model prediction behavior.