论文标题

实体意识到语法基于基于自然语言理解的数据增强

Entity Aware Syntax Tree Based Data Augmentation for Natural Language Understanding

论文作者

Xu, Jiaxing, Cui, Jianbin, Li, Jiangneng, Rong, Wenge, Matsuda, Noboru

论文摘要

了解用户的意图并从其句子(又称自然语言理解(NLU))中识别出语义实体,是许多自然语言处理任务的上游任务。主要挑战之一是收集足够数量的注释数据来培训模型。有关文本增强的现有研究并未大量考虑实体,因此对于NLU任务的表现不佳。为了解决这个问题,我们提出了一种新型的NLP数据增强技术,实体意识数据增强(EADA),该技术应用树结构,实体意识到语法树(EAST),以表示句子与对实体的关注相结合。我们的EADA技术会自动从少量注释的数据中构建东方,然后生成大量的培训实例,以进行意图检测和插槽填充。四个数据集的实验结果表明,该技术在准确性和概括能力方面显着优于现有数据增强方法。

Understanding the intention of the users and recognizing the semantic entities from their sentences, aka natural language understanding (NLU), is the upstream task of many natural language processing tasks. One of the main challenges is to collect a sufficient amount of annotated data to train a model. Existing research about text augmentation does not abundantly consider entity and thus performs badly for NLU tasks. To solve this problem, we propose a novel NLP data augmentation technique, Entity Aware Data Augmentation (EADA), which applies a tree structure, Entity Aware Syntax Tree (EAST), to represent sentences combined with attention on the entity. Our EADA technique automatically constructs an EAST from a small amount of annotated data, and then generates a large number of training instances for intent detection and slot filling. Experimental results on four datasets showed that the proposed technique significantly outperforms the existing data augmentation methods in terms of both accuracy and generalization ability.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源