论文标题

具有数百个原子的分子的准确全球机器学习力场

Accurate global machine learning force fields for molecules with hundreds of atoms

论文作者

Chmiela, Stefan, Vassilev-Galindo, Valentin, Unke, Oliver T., Kabylda, Adil, Sauceda, Huziel E., Tkatchenko, Alexandre, Müller, Klaus-Robert

论文摘要

全球机器学习力场(MLFF)具有捕获分子系统中集体多原子相互作用的能力,目前仅扩展到几十个原子,这是由于模型复杂性与系统大小相当大的增长。对于较大的分子,通常会引入局部性假设,因此即使这些相互作用包含在参考文献概述数据中,非本地相互作用也很差或根本不描述。在这里,我们应对这一挑战,并开发出一种精确的无参数方法,用于训练最多几百个原子的系统的全局对称梯度机器学习(SGDML)力场,而无需诉诸于任何原子相互作用或其他潜在不受控制的近似值。这意味着在整体SGDML FF中,所有原子自由度都保持完全相关,从而可以准确描述具有远距离角度的特征相关长度的复杂分子和材料。我们评估了MLFF在新开发的MD22基准数据集上的准确性和效率,该数据集含有42至370个原子的分子。在MD22数据集中的超分子复合物的纳秒长路径 - 综合分子动力学模拟中,我们的方法的鲁棒性得到了证明。

Global machine learning force fields (MLFFs), that have the capacity to capture collective many-atom interactions in molecular systems, currently only scale up to a few dozen atoms due a considerable growth of the model complexity with system size. For larger molecules, locality assumptions are typically introduced, with the consequence that non-local interactions are poorly or not at all described, even if those interactions are contained within the reference ab initio data. Here, we approach this challenge and develop an exact iterative parameter-free approach to train global symmetric gradient domain machine learning (sGDML) force fields for systems with up to several hundred atoms, without resorting to any localization of atomic interactions or other potentially uncontrolled approximations. This means that all atomic degrees of freedom remain fully correlated in the global sGDML FF, allowing the accurate description of complex molecules and materials that present phenomena with far-reaching characteristic correlation lengths. We assess the accuracy and efficiency of our MLFFs on a newly developed MD22 benchmark dataset containing molecules from 42 to 370 atoms. The robustness of our approach is demonstrated in nanosecond long path-integral molecular dynamics simulations for the supramolecular complexes in the MD22 dataset.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源