自我指导：将语言模型与自我生成的指示结合

论文标题

自我指导：将语言模型与自我生成的指示结合

Self-Instruct: Aligning Language Models with Self-Generated Instructions

论文作者

Wang, Yizhong, Kordi, Yeganeh, Mishra, Swaroop, Liu, Alisa, Smith, Noah A., Khashabi, Daniel, Hajishirzi, Hannaneh

论文摘要

大型的“指导调节”语言模型（即对指令做出响应的填充）表现出了非凡的能力，可以将零射击归为新任务。然而，它们在很大程度上取决于通常限制数量，多样性和创造力的人写的指令数据，因此阻碍了调谐模型的一般性。我们介绍了自我教育，这是一个框架，用于通过引导自己的几代人来提高预审前语言模型的指导能力。我们的管道从语言模型中生成指令，输入和输出样本，然后在使用原始模型之前过滤无效或类似模型。将我们的方法应用于Vanilla GPT3，我们证明了与“超级天然结构”的原始模型的绝对改进，与Conschitsgpt-001的性能相当，该模型接受了私人用户数据和人类注释的培训。为了进行进一步的评估，我们为新任务策划了一组专家写的说明，并通过人类评估表明，使用现有的公共指导数据集对GPT3进行调整越过大幅度，只有很大的差距，仅在Conservationgpt-001落后5％的绝对差距。自我教学提供了一种几乎没有注释的方法，可以使预训练的语言模型与指令保持一致，并释放大型合成数据集，以促进未来的教学调整研究。我们的代码和数据可在https://github.com/yizhongw/self-sinstruct上获得。

Large "instruction-tuned" language models (i.e., finetuned to respond to instructions) have demonstrated a remarkable ability to generalize zero-shot to new tasks. Nevertheless, they depend heavily on human-written instruction data that is often limited in quantity, diversity, and creativity, therefore hindering the generality of the tuned model. We introduce Self-Instruct, a framework for improving the instruction-following capabilities of pretrained language models by bootstrapping off their own generations. Our pipeline generates instructions, input, and output samples from a language model, then filters invalid or similar ones before using them to finetune the original model. Applying our method to the vanilla GPT3, we demonstrate a 33% absolute improvement over the original model on Super-NaturalInstructions, on par with the performance of InstructGPT-001, which was trained with private user data and human annotations. For further evaluation, we curate a set of expert-written instructions for novel tasks, and show through human evaluation that tuning GPT3 with Self-Instruct outperforms using existing public instruction datasets by a large margin, leaving only a 5% absolute gap behind InstructGPT-001. Self-Instruct provides an almost annotation-free method for aligning pre-trained language models with instructions, and we release our large synthetic dataset to facilitate future studies on instruction tuning. Our code and data are available at https://github.com/yizhongw/self-instruct.

下载PDF全文

下载文献需遵守相关版权规定

论文标题