论文标题
机器学习模型大小和参数差距
Machine Learning Model Sizes and the Parameter Gap
论文作者
论文摘要
我们使用策划的数据集研究了随着时间的推移,我们研究了著名机器学习系统模型大小的趋势。从1950年到2018年,语言模型中的模型大小稳步增加了七个数量级。然后加速了趋势,从2018年到2022年,模型大小在仅4年内增加了五个数量级。视觉模型以更恒定的速度增长,总计7个数量级的增长率在1950年至2022年之间。 我们还确定,自2020年以来,有许多低于20b参数的语言模型,许多模型以上是70b参数,但是在20-70B参数范围内的模型稀缺。我们将这种稀缺性称为参数差距。 我们提供了一些有关参数差距的风格化事实,并提出了一些假设来解释它。我们喜欢的解释是:(a)将模型大小的增加超过20b参数需要采用不同的并行性技术,这使得中型模型的成本效益较低,(b)GPT-3是比以前的语言模型大的数量级,然后研究人员主要尝试使用较大模型实验。尽管这些动态可能存在,并且我们认为它们在产生差距中发挥了一定的作用,但我们对没有其他更重要的动态在起作用。
We study trends in model size of notable machine learning systems over time using a curated dataset. From 1950 to 2018, model size in language models increased steadily by seven orders of magnitude. The trend then accelerated, with model size increasing by another five orders of magnitude in just 4 years from 2018 to 2022. Vision models grew at a more constant pace, totaling 7 orders of magnitude of growth between 1950 and 2022. We also identify that, since 2020, there have been many language models below 20B parameters, many models above 70B parameters, but a scarcity of models in the 20-70B parameter range. We refer to that scarcity as the parameter gap. We provide some stylized facts about the parameter gap and propose a few hypotheses to explain it. The explanations we favor are: (a) increasing model size beyond 20B parameters requires adopting different parallelism techniques, which makes mid-sized models less cost-effective, (b) GPT-3 was one order of magnitude larger than previous language models, and researchers afterwards primarily experimented with bigger models to outperform it. While these dynamics likely exist, and we believe they play some role in generating the gap, we don't have high confidence that there are no other, more important dynamics at play.