论文标题
通过共同学习分割和对应关系的神经数据到文本生成
Neural Data-to-Text Generation via Jointly Learning the Segmentation and Correspondence
论文作者
论文摘要
神经注意力模型在数据到文本生成任务中取得了巨大成功。尽管通常在产生流利的文本方面表现出色,但它遇到了缺少信息,重复和“幻觉”的问题。由于神经关注体系结构的黑盒性质,以系统的方式避免了这些问题是非平凡的。为了解决这个问题,我们建议将目标文本明确分为片段单元,并与其数据对应关系保持一致。分段和对应关系是共同学习的,作为潜在变量,而没有任何人类注释。我们进一步施加了软统计限制,以使节粒粒度正规化。最终的体系结构具有与神经注意力模型相同的表达能力,同时能够以减少计算成本的几倍来产生完全可解释的输出。在E2E和WebNLG基准上,我们显示所提出的模型始终优于其神经注意力。
The neural attention model has achieved great success in data-to-text generation tasks. Though usually excelling at producing fluent text, it suffers from the problem of information missing, repetition and "hallucination". Due to the black-box nature of the neural attention architecture, avoiding these problems in a systematic way is non-trivial. To address this concern, we propose to explicitly segment target text into fragment units and align them with their data correspondences. The segmentation and correspondence are jointly learned as latent variables without any human annotations. We further impose a soft statistical constraint to regularize the segmental granularity. The resulting architecture maintains the same expressive power as neural attention models, while being able to generate fully interpretable outputs with several times less computational cost. On both E2E and WebNLG benchmarks, we show the proposed model consistently outperforms its neural attention counterparts.