论文标题
力是不够的:对具有分子模拟的机器学习力场的基准和批判性评估
Forces are not Enough: Benchmark and Critical Evaluation for Machine Learning Force Fields with Molecular Simulations
论文作者
论文摘要
分子动力学(MD)模拟技术广泛用于各种自然科学应用。机器学习(ML)力场(FF)模型越来越多地通过直接从原子结构预测力来替代AB-Initio模拟。尽管在这一领域取得了重大进展,但这种技术主要由其力量/能量预测错误基准,即使实际用例是产生现实的MD轨迹。我们的目标是通过引入一个新颖的基准套件来填补这一空白,以用于学习的MD模拟。我们策划代表性的MD系统,包括水,有机分子,肽和材料以及与各个系统的科学目标相对应的设计评估指标。我们基准了最先进的(SOTA)ML FF模型的集合,并特别说明了通常的基准力精度与相关的仿真指标不符。我们证明了选定的SOTA方法何时以及如何失败,并提供了进一步改进的方向。具体而言,我们将稳定性确定为ML模型改进的关键指标。我们的基准套件配备了全面的开源代码库,用于与ML FFS进行培训和模拟,以促进未来的工作。
Molecular dynamics (MD) simulation techniques are widely used for various natural science applications. Increasingly, machine learning (ML) force field (FF) models begin to replace ab-initio simulations by predicting forces directly from atomic structures. Despite significant progress in this area, such techniques are primarily benchmarked by their force/energy prediction errors, even though the practical use case would be to produce realistic MD trajectories. We aim to fill this gap by introducing a novel benchmark suite for learned MD simulation. We curate representative MD systems, including water, organic molecules, a peptide, and materials, and design evaluation metrics corresponding to the scientific objectives of respective systems. We benchmark a collection of state-of-the-art (SOTA) ML FF models and illustrate, in particular, how the commonly benchmarked force accuracy is not well aligned with relevant simulation metrics. We demonstrate when and how selected SOTA methods fail, along with offering directions for further improvement. Specifically, we identify stability as a key metric for ML models to improve. Our benchmark suite comes with a comprehensive open-source codebase for training and simulation with ML FFs to facilitate future work.