论文标题
CiteBench:科学引文文本生成的基准
CiteBench: A benchmark for Scientific Citation Text Generation
论文作者
论文摘要
科学通过在科学出版物中记录的先前的知识体系来发展。研究的加速使得很难与最近的发展保持最新状态,并总结了不断增长的先前工作。为了解决这个问题,引用文本生成的任务旨在在鉴于一组论文到引用和引用纸张环境的情况下产生准确的文本摘要。由于在引用论文中以其他罕见的明确锚定的引用文档锚定,因此引用文本生成提供了一个极好的机会,可以研究人类如何从来源汇总和合成文本知识。但是,现有的研究基于广泛分歧的任务定义,这使得很难系统地研究此任务。为了应对这一挑战,我们提出了CiteBench:引用文本生成的基准,该基准统一了多种不同的数据集,并可以对跨任务设计和域进行引用文本生成模型的标准化评估。使用新的基准测试,我们研究了多个强基线的性能,测试其在数据集之间的可传递性,并对任务定义和评估提供新的见解,以指导引用文本生成的未来研究。我们在https://github.com/ukplab/citebench上公开提供CiteBench的代码。
Science progresses by building upon the prior body of knowledge documented in scientific publications. The acceleration of research makes it hard to stay up-to-date with the recent developments and to summarize the ever-growing body of prior work. To address this, the task of citation text generation aims to produce accurate textual summaries given a set of papers-to-cite and the citing paper context. Due to otherwise rare explicit anchoring of cited documents in the citing paper, citation text generation provides an excellent opportunity to study how humans aggregate and synthesize textual knowledge from sources. Yet, existing studies are based upon widely diverging task definitions, which makes it hard to study this task systematically. To address this challenge, we propose CiteBench: a benchmark for citation text generation that unifies multiple diverse datasets and enables standardized evaluation of citation text generation models across task designs and domains. Using the new benchmark, we investigate the performance of multiple strong baselines, test their transferability between the datasets, and deliver new insights into the task definition and evaluation to guide future research in citation text generation. We make the code for CiteBench publicly available at https://github.com/UKPLab/citebench.