论文标题
星际多环芳烃红外光谱的机器学习预测
Machine-learning prediction of infrared spectra of interstellar polycyclic aromatic hydrocarbons
论文作者
论文摘要
我们设计和训练神经网络(NN)模型,以有效预测星际多环芳烃(PAHS)的红外光谱,其计算成本比第一原则计算所需的数量级要低很多。 NN的输入基于从分子的骨骼公式中提取的摩根指纹,不需要精确的几何信息,例如原子间距离。该模型显示出样本外输入的出色预测技能,使其适合改善目前用于理解星际介质的化学组成和演变的混合模型。我们还确定了培训数据多样性有限的多样性引起的其适用性的限制,并使用对数据子集进行培训的NNS集合来估算预测错误。在其他机器学习方法(如随机森林)的帮助下,我们在该预测中剖析了不同化学特征的作用。这些拓扑描述符的力量通过以库仑基质特征值形式包括详细的几何信息的有限效果来证明。
We design and train a neural network (NN) model to efficiently predict the infrared spectra of interstellar polycyclic aromatic hydrocarbons (PAHs) with a computational cost many orders of magnitude lower than what a first-principles calculation would demand. The input to the NN is based on the Morgan fingerprints extracted from the skeletal formulas of the molecules and does not require precise geometrical information such as interatomic distances. The model shows excellent predictive skill for out-of-sample inputs, making it suitable for improving the mixture models currently used for understanding the chemical composition and evolution of the interstellar medium. We also identify the constraints to its applicability caused by the limited diversity of the training data and estimate the prediction errors using a ensemble of NNs trained on subsets of the data. With help from other machine-learning methods like random forests, we dissect the role of different chemical features in this prediction. The power of these topological descriptors is demonstrated by the limited effect of including detailed geometrical information in the form of Coulomb matrix eigenvalues.