论文标题
对话话语感知的图形模型和数据扩展用于汇总
Dialogue Discourse-Aware Graph Model and Data Augmentation for Meeting Summarization
论文作者
论文摘要
汇总摘要是一项具有挑战性的任务,因为它在多个说话者之间的动态互动性质以及缺乏足够的培训数据。现有方法将会议视为话语的线性序列,同时忽略了每种话语之间的各种关系。此外,有限的标记数据进一步阻碍了渴望数据的神经模型的能力。在本文中,我们试图通过引入对话分节关系来减轻上述挑战。首先,我们提出了对话讨论访问摘要(DDAMS),以通过建模不同的话语关系来明确地对会议中的讲话之间的相互作用进行建模。核心模块是一个关系图编码器,在该编码中,话语和话语关系以图形相互作用方式建模。此外,我们设计了一个对话讨论的数据增强(DDADA)策略,以从现有的输入会议中构建伪苏明化语料库,该会议比原始数据集大20倍,可以用来给DDAM预留。 AMI和ICSI会议数据集的实验结果表明,我们的完整系统可以实现SOTA性能。我们的代码将提供:https://github.com/xcfcode/ddams。
Meeting summarization is a challenging task due to its dynamic interaction nature among multiple speakers and lack of sufficient training data. Existing methods view the meeting as a linear sequence of utterances while ignoring the diverse relations between each utterance. Besides, the limited labeled data further hinders the ability of data-hungry neural models. In this paper, we try to mitigate the above challenges by introducing dialogue-discourse relations. First, we present a Dialogue Discourse-Dware Meeting Summarizer (DDAMS) to explicitly model the interaction between utterances in a meeting by modeling different discourse relations. The core module is a relational graph encoder, where the utterances and discourse relations are modeled in a graph interaction manner. Moreover, we devise a Dialogue Discourse-Aware Data Augmentation (DDADA) strategy to construct a pseudo-summarization corpus from existing input meetings, which is 20 times larger than the original dataset and can be used to pretrain DDAMS. Experimental results on AMI and ICSI meeting datasets show that our full system can achieve SOTA performance. Our codes will be available at: https://github.com/xcfcode/DDAMS.