调整语言模型作为培训数据生成器，用于增强功能

论文标题

调整语言模型作为培训数据生成器，用于增强功能

Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning

论文作者

Meng, Yu, Michalski, Martin, Huang, Jiaxin, Zhang, Yu, Abdelzaher, Tarek, Han, Jiawei

论文摘要

最近的研究揭示了熟悉的语言模型（PLM）的吸引人的学习能力很少：当以少量标记的数据为提示的少量标记数据时，它们可以迅速适应新任务，而无需大量的特定任务注释。尽管表现出色，但大多数现有的几次射击方法只能从小型培训集中学习，但由于非平凡的利润率仍然不足。在这项工作中，我们从不同的角度研究了一些使用PLM的学习：我们首先在几个弹药样本上调整自回归PLM，然后将其用作生成器，以合成大量新型培训样本，以增强原始培训集。为了鼓励发电机产生标签 - 歧义样品，我们通过加权最大似然训练它，其中每个令牌的权重自动根据判别性元学习目标自动调整。然后，可以在少数弹药和合成样品上对分类PLM进行微调，以便更好地概括和稳定性。我们的方法很少有胶水基准的七个分类任务比现有的少量学习方法取得更好的结果，从而提高了5个以上的平均值，并以3个以上的平均点提高了无功能的方法。

Recent studies have revealed the intriguing few-shot learning ability of pretrained language models (PLMs): They can quickly adapt to a new task when fine-tuned on a small amount of labeled data formulated as prompts, without requiring abundant task-specific annotations. Despite their promising performance, most existing few-shot approaches that only learn from the small training set still underperform fully supervised training by nontrivial margins. In this work, we study few-shot learning with PLMs from a different perspective: We first tune an autoregressive PLM on the few-shot samples and then use it as a generator to synthesize a large amount of novel training samples which augment the original training set. To encourage the generator to produce label-discriminative samples, we train it via weighted maximum likelihood where the weight of each token is automatically adjusted based on a discriminative meta-learning objective. A classification PLM can then be fine-tuned on both the few-shot and the synthetic samples with regularization for better generalization and stability. Our approach FewGen achieves an overall better result across seven classification tasks of the GLUE benchmark than existing few-shot learning methods, improving no-augmentation methods by 5+ average points, and outperforming augmentation methods by 3+ average points.

下载PDF全文

下载文献需遵守相关版权规定

论文标题