数据提升：通过加强学习引导有条件产生的文本数据增强

论文标题

数据提升：通过加强学习引导有条件产生的文本数据增强

Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation

论文作者

Liu, Ruibo, Xu, Guangxuan, Jia, Chenyan, Ma, Weicheng, Wang, Lili, Vosoughi, Soroush

论文摘要

事实证明，数据增强在许多NLU任务中都是有效的，尤其是对于那些缺乏数据稀缺的人。在本文中，我们提出了一个功能强大且易于部署的文本增强框架，数据Boost，该框架通过加强学习引导有条件生成来增强数据。我们在五个不同的分类器体系结构下评估了三个不同的文本分类任务的数据提升。结果表明，数据提升可以提高分类器的性能，尤其是在低资源数据方案中。例如，当仅授予培训的整个数据的10％时，数据提升将三个任务的F1平均提高了8.7％。我们还将数据提升与六个先前的文本增强方法进行了比较。通过人类评估（n = 178），我们确认数据增强的质量与可读性和阶级一致性相当的原始数据。

Data augmentation is proven to be effective in many NLU tasks, especially for those suffering from data scarcity. In this paper, we present a powerful and easy to deploy text augmentation framework, Data Boost, which augments data through reinforcement learning guided conditional generation. We evaluate Data Boost on three diverse text classification tasks under five different classifier architectures. The result shows that Data Boost can boost the performance of classifiers especially in low-resource data scenarios. For instance, Data Boost improves F1 for the three tasks by 8.7% on average when given only 10% of the whole data for training. We also compare Data Boost with six prior text augmentation methods. Through human evaluations (N=178), we confirm that Data Boost augmentation has comparable quality as the original data with respect to readability and class consistency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题