选择：开放预训练的变压器语言模型

论文标题

选择：开放预训练的变压器语言模型

OPT: Open Pre-trained Transformer Language Models

论文作者

Zhang, Susan, Roller, Stephen, Goyal, Naman, Artetxe, Mikel, Chen, Moya, Chen, Shuohui, Dewan, Christopher, Diab, Mona, Li, Xian, Lin, Xi Victoria, Mihaylov, Todor, Ott, Myle, Shleifer, Sam, Shuster, Kurt, Simig, Daniel, Koura, Punit Singh, Sridhar, Anjali, Wang, Tianlu, Zettlemoyer, Luke

论文摘要

大型语言模型通常经过数十万个计算天的培训，已经显示出极大的零和少数学习能力。鉴于它们的计算成本，如果没有大量资本，这些模型很难复制。对于通过API可用的少数产品，无法允许全部模型权重，因此很难学习。我们提出了开放的预训练变压器（OPT），这是一套仅解码器预训练的变压器，范围为12500万至175b参数，我们旨在与感兴趣的研究人员完全和负责任地分享。我们表明，OPT-175B与GPT-3相当，而仅需要1/7碳足迹才能开发。我们还释放了我们所面临的基础架构挑战的日志，以及用于尝试所有发布模型的代码。

Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are difficult to replicate without significant capital. For the few that are available through APIs, no access is granted to the full model weights, making them difficult to study. We present Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters, which we aim to fully and responsibly share with interested researchers. We show that OPT-175B is comparable to GPT-3, while requiring only 1/7th the carbon footprint to develop. We are also releasing our logbook detailing the infrastructure challenges we faced, along with code for experimenting with all of the released models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题