论文标题
部分可观测时空混沌系统的无模型预测
General-Purpose In-Context Learning by Meta-Learning Transformers
论文作者
论文摘要
现代机器学习要求系统设计人员指定学习管道的各个方面,例如损失,体系结构和优化器。元学习或学习到学习的旨在学习这些方面,并承诺通过减少手动努力来解锁更大的功能。元学习的一个特别雄心勃勃的目标是仅使用具有最小电感性偏差的黑盒模型来训练从头开始学习算法的通用学习算法。这样的模型会采用培训数据,并在广泛的问题上产生测试集预测,而没有任何明确定义推理模型,训练损失或优化算法。在本文中,我们表明,可以将变形金刚和其他黑框模型进行元训练,以充当通用的内在学习者。我们表征了在概括,记忆的算法和完全无法元热训练的算法之间的过渡,这是由于模型大小的变化,任务数量和元优化而引起的。我们进一步表明,元训练算法的功能由确定下一个预测的可访问状态大小(内存)瓶颈,这与标准模型不同,这些模型被认为是通过参数计数瓶颈的。最后,我们提出了实用的干预措施,例如偏向训练分布,以改善通用物质中文化学习算法的荟萃训练和元化。
Modern machine learning requires system designers to specify aspects of the learning pipeline, such as losses, architectures, and optimizers. Meta-learning, or learning-to-learn, instead aims to learn those aspects, and promises to unlock greater capabilities with less manual effort. One particularly ambitious goal of meta-learning is to train general-purpose in-context learning algorithms from scratch, using only black-box models with minimal inductive bias. Such a model takes in training data, and produces test-set predictions across a wide range of problems, without any explicit definition of an inference model, training loss, or optimization algorithm. In this paper we show that Transformers and other black-box models can be meta-trained to act as general-purpose in-context learners. We characterize transitions between algorithms that generalize, algorithms that memorize, and algorithms that fail to meta-train at all, induced by changes in model size, number of tasks, and meta-optimization. We further show that the capabilities of meta-trained algorithms are bottlenecked by the accessible state size (memory) determining the next prediction, unlike standard models which are thought to be bottlenecked by parameter count. Finally, we propose practical interventions such as biasing the training distribution that improve the meta-training and meta-generalization of general-purpose in-context learning algorithms.