论文标题
Moformer:金属有机框架属性预测的自我监督的变压器模型
MOFormer: Self-Supervised Transformer model for Metal-Organic Framework Property Prediction
论文作者
论文摘要
金属有机框架(MOFS)是具有高度孔隙率的材料,可用于储能,水脱盐,气体存储和气体分离。但是,由于构建基块和拓扑结合的各种可能组合,MOF的化学空间接近无限的尺寸。发现针对特定应用的最佳MOF需要对大量潜在候选者进行有效而准确的搜索。以前使用DFT等计算模拟的高通量筛选方法可能很耗时。这种方法还需要优化MOF的3D原子结构,在评估假设MOF时,这增加了一个额外的步骤。在这项工作中,我们提出了一种基于变压器模型(称为Moformer)的结构不足的深度学习方法,用于MOF的属性预测。 Moformer将MOF(MOFID)的文本字符串表示为输入,从而规避了获得假设MOF的3D结构并加速筛选过程的需要。此外,我们引入了一个自制的学习框架,该框架通过在> 400K公共可用的MOF数据上最大化其结构 - 不合命相表示和基于结构的基于结构的表示,从而为MoFormer预识了Moformer。使用自我监督的学习可以使Moformer本质地学习3D结构信息,尽管该信息不包括在输入中。实验表明,预处理提高了两个模型在各种下游预测任务上的预测准确性。此外,我们透露,与基于结构的CGCNN相比,当训练数据受到限制时,Moformer对量子化学性质预测的数据效率高。总体而言,Moformer使用深度学习提供了有效的MOF设计的新观点。
Metal-Organic Frameworks (MOFs) are materials with a high degree of porosity that can be used for applications in energy storage, water desalination, gas storage, and gas separation. However, the chemical space of MOFs is close to an infinite size due to the large variety of possible combinations of building blocks and topology. Discovering the optimal MOFs for specific applications requires an efficient and accurate search over an enormous number of potential candidates. Previous high-throughput screening methods using computational simulations like DFT can be time-consuming. Such methods also require optimizing 3D atomic structure of MOFs, which adds one extra step when evaluating hypothetical MOFs. In this work, we propose a structure-agnostic deep learning method based on the Transformer model, named as MOFormer, for property predictions of MOFs. The MOFormer takes a text string representation of MOF (MOFid) as input, thus circumventing the need of obtaining the 3D structure of hypothetical MOF and accelerating the screening process. Furthermore, we introduce a self-supervised learning framework that pretrains the MOFormer via maximizing the cross-correlation between its structure-agnostic representations and structure-based representations of crystal graph convolutional neural network (CGCNN) on >400k publicly available MOF data. Using self-supervised learning allows the MOFormer to intrinsically learn 3D structural information though it is not included in the input. Experiments show that pretraining improved the prediction accuracy of both models on various downstream prediction tasks. Furthermore, we revealed that MOFormer can be more data-efficient on quantum-chemical property prediction than structure-based CGCNN when training data is limited. Overall, MOFormer provides a novel perspective on efficient MOF design using deep learning.