使用图神经网络预测Kovats保留指数

论文标题

使用图神经网络预测Kovats保留指数

Predicting Kovats Retention Indices Using Graph Neural Networks

论文作者

Qu, Chen, Schneider, Barry I., Kearsley, Anthony J., Keyrouz, Walid, Allison, Thomas C.

论文摘要

\ kovats保留指数是一个无量纲的数量，表征了通过气相色谱柱处理化合物的速率。该数量与许多实验变量无关，因此被认为是色谱柱上保留时间的近乎宇宙描述符。已经通过实验确定了大量分子的\ kovats保留指数。已经收集了“ NIST 20：GC方法\ Slash保留指数库”数据库，更重要的是，这些化合物子集的策划保留指数导致了高度重视的参考数据库。库中的实验数据构成了一个理想的数据集，用于训练机器学习模型，以预测未知化合物的保留指数。在本文中，我们描述了图形神经网络模型的培训，以预测NIST库中化合物的\ kovats保留指数，并将这种方法与以前的工作\ cite \ cite {2019matyushin}进行比较。我们预测\ kovats保留指数的平均无符号误差为28个索引单元，而44个使用卷积神经网络\ cite {2019matyushin}，推定的最佳结果。 NIST库还基于基于小组贡献方法的估计方案，与实验数据相比，该方法达到平均无符号误差为114。我们的方法使用与组贡献方法相同的输入数据源，使其应用程序直接且方便，可用于现有库。我们的结果令人信服地证明了系统的，数据驱动的方法的预测能力，利用应用于化学数据的深度学习方法以及NIST 20库中的数据的预测能力优于先前的模型。

The \kovats retention index is a dimensionless quantity that characterizes the rate at which a compound is processed through a gas chromatography column. This quantity is independent of many experimental variables and, as such, is considered a near-universal descriptor of retention time on a chromatography column. The \kovats retention indices of a large number of molecules have been determined experimentally. The "NIST 20: GC Method\slash Retention Index Library" database has collected and, more importantly, curated retention indices of a subset of these compounds resulting in a highly valued reference database. The experimental data in the library form an ideal data set for training machine learning models for the prediction of retention indices of unknown compounds. In this article, we describe the training of a graph neural network model to predict the \kovats retention index for compounds in the NIST library and compare this approach with previous work \cite{2019Matyushin}. We predict the \kovats retention index with a mean unsigned error of 28 index units as compared to 44, the putative best result using a convolutional neural network \cite{2019Matyushin}. The NIST library also incorporates an estimation scheme based on a group contribution approach that achieves a mean unsigned error of 114 compared to the experimental data. Our method uses the same input data source as the group contribution approach, making its application straightforward and convenient to apply to existing libraries. Our results convincingly demonstrate the predictive powers of systematic, data-driven approaches leveraging deep learning methodologies applied to chemical data and for the data in the NIST 20 library outperform previous models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题