论文标题
大型语言模型是推理教师
Large Language Models Are Reasoning Teachers
论文作者
论文摘要
最近的作品表明,经过思考链(COT)提示可以逐步解决语言模型来解决复杂的推理任务。但是,基于及时的COT方法取决于非常大的模型,例如GPT-3 175B,这些模型无法大规模部署。在本文中,我们将这些大型模型用作推理教师,以在较小的模型中实现复杂的推理,并减少几个数量级。我们提出了微调,该方法从非常大的教师模型中生成推理样本到微调较小的模型。我们在广泛的公共模型和复杂任务上评估我们的方法。我们发现,微调可以在小型模型中实现实质性推理能力,在许多任务中都胜过基于及时的基准甚至教师模型。此外,我们通过利用教师模型为每个原始样本生成多个不同理由的能力来扩展方法。通过如此多样化的推理丰富了微调数据,即使对于很小的模型,跨数据集的性能就会大大提高。我们进行消融和样本研究,以了解学生模型推理能力的出现。我们的代码实施和数据可在https://github.com/itsnamgyu/reasoning-teacher上获得。
Recent works have shown that chain-of-thought (CoT) prompting can elicit language models to solve complex reasoning tasks, step-by-step. However, prompt-based CoT methods are dependent on very large models such as GPT-3 175B which are prohibitive to deploy at scale. In this paper, we use these large models as reasoning teachers to enable complex reasoning in smaller models and reduce model size requirements by several orders of magnitude. We propose Fine-tune-CoT, a method that generates reasoning samples from very large teacher models to fine-tune smaller models. We evaluate our method on a wide range of public models and complex tasks. We find that Fine-tune-CoT enables substantial reasoning capability in small models, far outperforming prompt-based baselines and even the teacher model in many tasks. Additionally, we extend our method by leveraging the teacher model's ability to generate multiple distinct rationales for each original sample. Enriching the fine-tuning data with such diverse reasoning results in a substantial performance boost across datasets, even for very small models. We conduct ablations and sample studies to understand the emergence of reasoning capabilities of student models. Our code implementation and data are available at https://github.com/itsnamgyu/reasoning-teacher.