论文标题
持续学习的统计力学:变分原理和平均场潜力
Statistical mechanics of continual learning: variational principle and mean-field potential
论文作者
论文摘要
人工通用智能的障碍是通过不断学习不同性质的多个任务来确定的。最近,提出了从机器学习和神经科学角度来看的各种启发式技巧,但它们缺乏统一的理论基础。在这里,我们专注于二进制重量的单层和多层神经网络的持续学习。因此,提出了一个差异性贝叶斯学习环境,其中神经网络是在田间空间中进行训练的,而不是梯度毫无定义的离散空间,此外,体重不确定性是自然纳入的,并调节任务之间的突触资源。从物理学的角度来看,我们将各种持续学习转化为Franz-Parisi热力学潜在框架,在这种框架中,先前的任务知识也是先验和参考。因此,我们将二进制感知者在教师学生环境中的持续学习为Franz-Parisi的潜在计算。然后可以通过平均场阶参数分析学习性能,其预测与使用随机梯度下降方法的数值实验相吻合。基于隐藏层中内部预反应的变异原理和高斯场近似,我们还得出了考虑体重不确定性的学习算法,该算法使用多层神经网络使用二进制重量来解决持续学习,并且比当前可用的元塑性算法算法更好。我们提出的原始框架还与弹性重量巩固,重量不确定性调节学习以及神经科学启发的跨性塑性性相关联,为现实世界中的多任务学习提供了具有深层网络的真实世界多任务学习的方法。
An obstacle to artificial general intelligence is set by continual learning of multiple tasks of different nature. Recently, various heuristic tricks, both from machine learning and from neuroscience angles, were proposed, but they lack a unified theory ground. Here, we focus on continual learning in single-layered and multi-layered neural networks of binary weights. A variational Bayesian learning setting is thus proposed, where the neural networks are trained in a field-space, rather than gradient-ill-defined discrete-weight space, and furthermore, weight uncertainty is naturally incorporated, and modulates synaptic resources among tasks. From a physics perspective, we translate the variational continual learning into Franz-Parisi thermodynamic potential framework, where previous task knowledge acts as a prior and a reference as well. We thus interpret the continual learning of the binary perceptron in a teacher-student setting as a Franz-Parisi potential computation. The learning performance can then be analytically studied with mean-field order parameters, whose predictions coincide with numerical experiments using stochastic gradient descent methods. Based on the variational principle and Gaussian field approximation of internal preactivations in hidden layers, we also derive the learning algorithm considering weight uncertainty, which solves the continual learning with binary weights using multi-layered neural networks, and performs better than the currently available metaplasticity algorithm. Our proposed principled frameworks also connect to elastic weight consolidation, weight-uncertainty modulated learning, and neuroscience inspired metaplasticity, providing a theory-grounded method for the real-world multi-task learning with deep networks.