论文标题

使用语言模型生成培训数据:朝着零局的语言理解

Generating Training Data with Language Models: Towards Zero-Shot Language Understanding

论文作者

Meng, Yu, Huang, Jiaxin, Zhang, Yu, Han, Jiawei

论文摘要

预审前的语言模型(PLM)在各种自然语言处理任务中表现出色:单向PLM(例如GPT)以其出色的文本生成能力而闻名;双向PLM(例如BERT)已成为自然语言理解(NLU)任务的重要选择。尽管这两种模型都达到了有希望的很少的学习表现,但它们的零照片学习潜力却没有得到充实。在本文中,我们提出了一种简单的方法,该方法使用两种类型的PLM用于对NLU任务的全部零摄零学习,而无需任何特定任务的数据:单向PLM生成由提示指导的类带有的课堂条件文本,这些文本被提示用作训练数据,用于对双向PLM进行微调。 With quality training data selected based on the generation probability and regularization techniques (label smoothing and temporal ensembling) applied to the fine-tuning stage for better generalization and stability, our approach demonstrates strong performance across seven classification tasks of the GLUE benchmark (e.g., 72.3/73.8 on MNLI-m/mm and 92.8 on SST-2), significantly outperforming zero-shot prompting methods and achieving even comparable results使用每班32个训练样本进行强少射击方法。

Pretrained language models (PLMs) have demonstrated remarkable performance in various natural language processing tasks: Unidirectional PLMs (e.g., GPT) are well known for their superior text generation capabilities; bidirectional PLMs (e.g., BERT) have been the prominent choice for natural language understanding (NLU) tasks. While both types of models have achieved promising few-shot learning performance, their potential for zero-shot learning has been underexplored. In this paper, we present a simple approach that uses both types of PLMs for fully zero-shot learning of NLU tasks without requiring any task-specific data: A unidirectional PLM generates class-conditioned texts guided by prompts, which are used as the training data for fine-tuning a bidirectional PLM. With quality training data selected based on the generation probability and regularization techniques (label smoothing and temporal ensembling) applied to the fine-tuning stage for better generalization and stability, our approach demonstrates strong performance across seven classification tasks of the GLUE benchmark (e.g., 72.3/73.8 on MNLI-m/mm and 92.8 on SST-2), significantly outperforming zero-shot prompting methods and achieving even comparable results to strong few-shot approaches using 32 training samples per class.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源