论文标题
Rev:自由文本理性的信息理论评估
REV: Information-Theoretic Evaluation of Free-Text Rationales
论文作者
论文摘要
产生自由文本的理由是迈向可解释的NLP的有前途的一步,但是评估这种理由仍然是一个挑战。现有的指标主要集中在衡量基本原理和给定标签之间的关联。我们认为,理想的指标应集中在基本原理中独特提供的新信息上,否则输入或标签中未提供。我们使用条件V信息从信息理论的角度研究了这一研究问题(Hewitt等,2021)。更具体地说,我们提出了一个称为Rev的指标(带有条件V信息的理性评估),以量化理由中新的,标签与标签相关的信息的数量,而不是输入或标签中已经可用的信息。与现有指标相比,跨四个基准的实验(包括推理任务)(包括思考链)证明了Rev在评估理由标签对的有效性。我们进一步证明,REV与人类对理由评估的判断是一致的,并提供了对自由文本理性中新信息的更敏感的测量。当与传统性能指标一起使用时,Rev提供了对模型推理和预测过程的更深入的见解。
Generating free-text rationales is a promising step towards explainable NLP, yet evaluating such rationales remains a challenge. Existing metrics have mostly focused on measuring the association between the rationale and a given label. We argue that an ideal metric should focus on the new information uniquely provided in the rationale that is otherwise not provided in the input or the label. We investigate this research problem from an information-theoretic perspective using conditional V-information (Hewitt et al., 2021). More concretely, we propose a metric called REV (Rationale Evaluation with conditional V-information), to quantify the amount of new, label-relevant information in a rationale beyond the information already available in the input or the label. Experiments across four benchmarks with reasoning tasks, including chain-of-thought, demonstrate the effectiveness of REV in evaluating rationale-label pairs, compared to existing metrics. We further demonstrate REV is consistent with human judgments on rationale evaluations and provides more sensitive measurements of new information in free-text rationales. When used alongside traditional performance metrics, REV provides deeper insights into models' reasoning and prediction processes.