论文标题
可读性可控的生物医学文档摘要
Readability Controllable Biomedical Document Summarization
论文作者
论文摘要
与一般文件不同,人们认识到,由于生物医学文档的高度技术性以及读者领域知识的差异,人们可以理解生物医学文本的便利性。但是,现有的生物医学文档摘要系统几乎不关注可读性控制,从而使用户的摘要与其专业水平不相容。为了认识到这一紧急需求,我们引入了一项新的可读性可控制性控制生物医学文档的任务,该任务旨在识别用户的可读性需求,并生成更适合其需求的摘要:专家的技术摘要和Laymen的普通语言摘要(PLS)。为了建立这项任务,我们构建了一个由作者编写的技术摘要和PLS的生物医学论文组成的语料库,并基于基于预先训练的语言模型(PLM)的多个高级可控的抽象和提取性摘要模型,并具有普遍的控制和发电技术。此外,我们提出了一种基于新颖的蒙版语言模型(MLM)度量及其变体,以有效评估外行和技术摘要之间的可读性差异。自动化和人类评估的实验结果表明,尽管当前的控制技术允许在生成过程中进行一定程度的可读性调整,但是在此任务中,现有可控摘要方法的性能远非理想的。
Different from general documents, it is recognised that the ease with which people can understand a biomedical text is eminently varied, owing to the highly technical nature of biomedical documents and the variance of readers' domain knowledge. However, existing biomedical document summarization systems have paid little attention to readability control, leaving users with summaries that are incompatible with their levels of expertise. In recognition of this urgent demand, we introduce a new task of readability controllable summarization for biomedical documents, which aims to recognise users' readability demands and generate summaries that better suit their needs: technical summaries for experts and plain language summaries (PLS) for laymen. To establish this task, we construct a corpus consisting of biomedical papers with technical summaries and PLSs written by the authors, and benchmark multiple advanced controllable abstractive and extractive summarization models based on pre-trained language models (PLMs) with prevalent controlling and generation techniques. Moreover, we propose a novel masked language model (MLM) based metric and its variant to effectively evaluate the readability discrepancy between lay and technical summaries. Experimental results from automated and human evaluations show that though current control techniques allow for a certain degree of readability adjustment during generation, the performance of existing controllable summarization methods is far from desirable in this task.