论文标题
通过多模式信息进行分子联合表示学习
Molecular Joint Representation Learning via Multi-modal Information
论文作者
论文摘要
近年来,人工智能在加速整个药物发现过程中发挥了重要作用。开发了不同模态(例如文本序列或图)的各种分子表示方案。通过以数字方式编码它们,可以通过相应的网络结构来学习不同的化学信息。分子图和简化的分子输入线进入系统(Smiles)是流行的流行手段。以前的工作通过将两者结合起来解决了各种任务中单模式表示中特定信息丢失的问题进行了尝试。为了进一步融合这种多模式的进化,应考虑从不同表示的学到的化学特征之间的对应关系。为了实现这一目标,我们通过微笑和分子图的多模式信息(称为MMSG)提出了一个新的分子联合表示学习框架。我们通过引入键级图表作为变压器的注意力偏置来提高自我发挥机制,从而增强多模式信息之间的特征对应关系。我们进一步提出了双向消息通信图神经网络(BMC GNN),以加强从图中汇总的信息流以进一步组合。关于公共财产预测数据集的许多实验证明了我们的模型的有效性。
In recent years, artificial intelligence has played an important role on accelerating the whole process of drug discovery. Various of molecular representation schemes of different modals (e.g. textual sequence or graph) are developed. By digitally encoding them, different chemical information can be learned through corresponding network structures. Molecular graphs and Simplified Molecular Input Line Entry System (SMILES) are popular means for molecular representation learning in current. Previous works have done attempts by combining both of them to solve the problem of specific information loss in single-modal representation on various tasks. To further fusing such multi-modal imformation, the correspondence between learned chemical feature from different representation should be considered. To realize this, we propose a novel framework of molecular joint representation learning via Multi-Modal information of SMILES and molecular Graphs, called MMSG. We improve the self-attention mechanism by introducing bond level graph representation as attention bias in Transformer to reinforce feature correspondence between multi-modal information. We further propose a Bidirectional Message Communication Graph Neural Network (BMC GNN) to strengthen the information flow aggregated from graphs for further combination. Numerous experiments on public property prediction datasets have demonstrated the effectiveness of our model.