镜头：简化文本的可学习评估指标

论文标题

镜头：简化文本的可学习评估指标

LENS: A Learnable Evaluation Metric for Text Simplification

论文作者

Maddela, Mounica, Dou, Yao, Heineman, David, Xu, Wei

论文摘要

使用现代语言模型的培训可学习的指标最近已成为自动评估机器翻译的有前途的方法。但是，现有的用于文本简化的人类评估数据集具有基于单一或过时的模型的有限注释，这使得它们不适合这种方法。为了解决这些问题，我们介绍了包含：Simpeval_past的Simpeval语料库，包括对过去24个系统的2.4k简化的12K人类评级，以及Simpeval_2022，这是一个挑战性的简化基准，该基准由超过1K的人类评级组成360多个简化，包括GPT-3.5生成的文本。我们对Simpeval的培训，我们提出了镜头，这是一种可学习的评估指标，用于简化文本。广泛的经验结果表明，与现有指标相比，镜头与人类判断的相关性要好得多，这为文本简化评估的未来进步铺平了道路。我们还引入了等级和速率，这是一个人类评估框架，使用交互式界面以列表方式对几个模型进行简化，从而确保评估过程中的一致性和准确性，并用于创建Simpeval数据集。

Training learnable metrics using modern language models has recently emerged as a promising method for the automatic evaluation of machine translation. However, existing human evaluation datasets for text simplification have limited annotations that are based on unitary or outdated models, making them unsuitable for this approach. To address these issues, we introduce the SimpEval corpus that contains: SimpEval_past, comprising 12K human ratings on 2.4K simplifications of 24 past systems, and SimpEval_2022, a challenging simplification benchmark consisting of over 1K human ratings of 360 simplifications including GPT-3.5 generated text. Training on SimpEval, we present LENS, a Learnable Evaluation Metric for Text Simplification. Extensive empirical results show that LENS correlates much better with human judgment than existing metrics, paving the way for future progress in the evaluation of text simplification. We also introduce Rank and Rate, a human evaluation framework that rates simplifications from several models in a list-wise manner using an interactive interface, which ensures both consistency and accuracy in the evaluation process and is used to create the SimpEval datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题