人类对摘要评估的最佳标准得分？

论文标题

人类对摘要评估的最佳标准得分？

Is human scoring the best criteria for summary evaluation?

论文作者

Vasilyev, Oleg, Bohannon, John

论文摘要

通常，将摘要质量度量与人类注释者产生的质量得分进行比较。与人类分数的更高相关性被认为是更好衡量标准的公平指标。我们讨论对这种观点产生怀疑的观察。我们试图显示出替代指标的可能性。鉴于一系列措施，我们探讨了选择最佳度量的标准，而不依赖于与人类分数的相关性。我们对Blanc措施家族家族的观察表明，在截然不同的摘要中，该标准是普遍的。

Normally, summary quality measures are compared with quality scores produced by human annotators. A higher correlation with human scores is considered to be a fair indicator of a better measure. We discuss observations that cast doubt on this view. We attempt to show a possibility of an alternative indicator. Given a family of measures, we explore a criterion of selecting the best measure not relying on correlations with human scores. Our observations for the BLANC family of measures suggest that the criterion is universal across very different styles of summaries.

下载PDF全文

下载文献需遵守相关版权规定

论文标题