使用语言模型生成培训数据：朝着零局的语言理解

论文标题

使用语言模型生成培训数据：朝着零局的语言理解

Generating Training Data with Language Models: Towards Zero-Shot Language Understanding

论文作者

Meng, Yu, Huang, Jiaxin, Zhang, Yu, Han, Jiawei

论文摘要

预审前的语言模型（PLM）在各种自然语言处理任务中表现出色：单向PLM（例如GPT）以其出色的文本生成能力而闻名；双向PLM（例如BERT）已成为自然语言理解（NLU）任务的重要选择。尽管这两种模型都达到了有希望的很少的学习表现，但它们的零照片学习潜力却没有得到充实。在本文中，我们提出了一种简单的方法，该方法使用两种类型的PLM用于对NLU任务的全部零摄零学习，而无需任何特定任务的数据：单向PLM生成由提示指导的类带有的课堂条件文本，这些文本被提示用作训练数据，用于对双向PLM进行微调。 With quality training data selected based on the generation probability and regularization techniques (label smoothing and temporal ensembling) applied to the fine-tuning stage for better generalization and stability, our approach demonstrates strong performance across seven classification tasks of the GLUE benchmark (e.g., 72.3/73.8 on MNLI-m/mm and 92.8 on SST-2), significantly outperforming zero-shot prompting methods and achieving even comparable results使用每班32个训练样本进行强少射击方法。

Pretrained language models (PLMs) have demonstrated remarkable performance in various natural language processing tasks: Unidirectional PLMs (e.g., GPT) are well known for their superior text generation capabilities; bidirectional PLMs (e.g., BERT) have been the prominent choice for natural language understanding (NLU) tasks. While both types of models have achieved promising few-shot learning performance, their potential for zero-shot learning has been underexplored. In this paper, we present a simple approach that uses both types of PLMs for fully zero-shot learning of NLU tasks without requiring any task-specific data: A unidirectional PLM generates class-conditioned texts guided by prompts, which are used as the training data for fine-tuning a bidirectional PLM. With quality training data selected based on the generation probability and regularization techniques (label smoothing and temporal ensembling) applied to the fine-tuning stage for better generalization and stability, our approach demonstrates strong performance across seven classification tasks of the GLUE benchmark (e.g., 72.3/73.8 on MNLI-m/mm and 92.8 on SST-2), significantly outperforming zero-shot prompting methods and achieving even comparable results to strong few-shot approaches using 32 training samples per class.

下载PDF全文

下载文献需遵守相关版权规定

论文标题