论文标题

统一文本细分和长文档摘要

Toward Unifying Text Segmentation and Long Document Summarization

论文作者

Cho, Sangwoo, Song, Kaiqiang, Wang, Xiaoyang, Liu, Fei, Yu, Dong

论文摘要

文本细分对于信号文档的结构很重要。如果不将长文档分割为局部连贯的部分,读者很难理解文本,更不用说找到重要的信息了。这个问题只会因音频/视频录音的转录本缺乏细分而加剧。在本文中,我们探讨了部分细分在书面和口语文档的提取性摘要中所扮演的角色。我们的方法通过同时执行摘要和分割来学习强大的句子表示,这通过基于优化的正规器进一步增强,以促进选择各种摘要句子的选择。我们对从科学文章到口头成绩单的多个数据集进行实验,以评估模型的性能。我们的发现表明,该模型不仅可以在公开可用的基准上实现最先进的性能,而且还可以在配备文本细分时表现出更好的跨流行可传递性。我们执行一系列分析,以量化截面分割对汇总长度和复杂性的书面和口头文档的影响。

Text segmentation is important for signaling a document's structure. Without segmenting a long document into topically coherent sections, it is difficult for readers to comprehend the text, let alone find important information. The problem is only exacerbated by a lack of segmentation in transcripts of audio/video recordings. In this paper, we explore the role that section segmentation plays in extractive summarization of written and spoken documents. Our approach learns robust sentence representations by performing summarization and segmentation simultaneously, which is further enhanced by an optimization-based regularizer to promote selection of diverse summary sentences. We conduct experiments on multiple datasets ranging from scientific articles to spoken transcripts to evaluate the model's performance. Our findings suggest that the model can not only achieve state-of-the-art performance on publicly available benchmarks, but demonstrate better cross-genre transferability when equipped with text segmentation. We perform a series of analyses to quantify the impact of section segmentation on summarizing written and spoken documents of substantial length and complexity.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源