论文标题
UNISUMM和SUMSZOO:统一模型和多样的基准,用于几次摘要
UniSumm and SummZoo: Unified Model and Diverse Benchmark for Few-Shot Summarization
论文作者
论文摘要
各种摘要任务的高注释成本和各种需求激发了很少的摘要的发展。但是,尽管出现了许多摘要任务和数据集,但目前的几个镜头摘要系统的培训范式忽略了异质数据集中潜在的可共享知识。为此,我们提出了\ textsc {unisumm},这是一个通过多个摘要任务进行预训练的统一的几幅摘要模型,并且可以在任何几次摘要任务中进行前缀调整。同时,为了更好地评估几乎没有镜头的摘要,根据多样性和鲁棒性的原则,我们组装并发布了新的基准\ textsc {summzoo}。它由$ 8 $的汇总任务组成,每个任务都有多组少量样本,涵盖了不同的域。实验结果和分析表明,在自动和人类评估中,\ textsc {unisumm}在\ textsc {summzoo}中的所有子任务中都超过强大的基线,并且与GPT-3.5模型相比,在人类评估中获得了可比的结果。
The high annotation costs and diverse demands of various summarization tasks motivate the development of few-shot summarization. However, despite the emergence of many summarization tasks and datasets, the current training paradigm for few-shot summarization systems ignores potentially shareable knowledge in heterogeneous datasets. To this end, we propose \textsc{UniSumm}, a unified few-shot summarization model pre-trained with multiple summarization tasks and can be prefix-tuned to excel at any few-shot summarization task. Meanwhile, to better evaluate few-shot summarizers, under the principles of diversity and robustness, we assemble and release a new benchmark \textsc{SummZoo}. It consists of $8$ summarization tasks with multiple sets of few-shot samples for each task, covering diverse domains. Experimental results and analysis show that \textsc{UniSumm} outperforms strong baselines by a large margin across all sub-tasks in \textsc{SummZoo} under both automatic and human evaluations and achieves comparable results in human evaluation compared with a GPT-3.5 model.