RLEKF：具有从头算准确性的优化器，具有深度潜力

论文标题

RLEKF：具有从头算准确性的优化器，具有深度潜力

RLEKF: An Optimizer for Deep Potential with Ab Initio Accuracy

论文作者

Hu, Siyu, Zhang, Wentao, Sha, Qiuchen, Pan, Feng, Wang, Lin-Wang, Jia, Weile, Tan, Guangmng, Zhao, Tong

论文摘要

必须加速对神经网络力场的训练，例如深势，这通常需要基于第一原理计算的数千个图像，并且几天才能产生准确的势能表面。为此，我们提出了一个名为重组层扩展卡尔曼过滤（RLEKF）的新型优化器，这是全球扩展的Kalman滤波（GEKF）的优化版本，其策略是将大型和聚集的小层分开以克服$ O（n^2）$ GEKF的计算成本。该策略提供了密集的权重误差协方差矩阵的近似值，其对角块矩阵的GEKF矩阵。我们在$α$ Dynamics软件包中同时实现了RLEKF和基线ADAM，并且在13个无偏数据集中进行了数值实验。总体而言，RLEKF的收敛速度更快。例如，对典型系统的测试，即散装铜，表明RLEKF的收敛速度更快，训练时期的数量（$ \ times $ 11.67）和壁式锁定时间（$ \ times $ \ times $ 1.19）。此外，从理论上讲，我们证明了权重的更新会融合，因此违反了梯度爆炸问题。实验结果验证了RLEKF对权重的初始化不敏感。 RLEKF阐明了其他用于训练大型神经网络（具有数千个参数）的AI-Science应用程序是一种瓶颈。

It is imperative to accelerate the training of neural network force field such as Deep Potential, which usually requires thousands of images based on first-principles calculation and a couple of days to generate an accurate potential energy surface. To this end, we propose a novel optimizer named reorganized layer extended Kalman filtering (RLEKF), an optimized version of global extended Kalman filtering (GEKF) with a strategy of splitting big and gathering small layers to overcome the $O(N^2)$ computational cost of GEKF. This strategy provides an approximation of the dense weights error covariance matrix with a sparse diagonal block matrix for GEKF. We implement both RLEKF and the baseline Adam in our $α$Dynamics package and numerical experiments are performed on 13 unbiased datasets. Overall, RLEKF converges faster with slightly better accuracy. For example, a test on a typical system, bulk copper, shows that RLEKF converges faster by both the number of training epochs ($\times$11.67) and wall-clock time ($\times$1.19). Besides, we theoretically prove that the updates of weights converge and thus are against the gradient exploding problem. Experimental results verify that RLEKF is not sensitive to the initialization of weights. The RLEKF sheds light on other AI-for-science applications where training a large neural network (with tons of thousands parameters) is a bottleneck.

下载PDF全文

下载文献需遵守相关版权规定

论文标题