论文标题
使用层次结构的多标签分类的酶混合预测
Enzyme promiscuity prediction using hierarchy-informed multi-label classification
论文作者
论文摘要
随着实验努力的昂贵且耗时,酶功能的计算表征是一种有吸引力的选择。我们介绍并评估了几种机器学习模型,以预测通过酶委员会(EC,数字)定义的983种不同酶中的哪种可能与给定的查询分子相互作用。我们的数据包括来自Brenda数据库的酶 - 基底相互作用。一些相互作用归因于自然选择,并涉及酶的自然底物。但是,大多数相互作用涉及非天然底物,因此反映了混杂的酶活性。我们将此酶滥交预测问题构图为多标签分类任务。我们最大程度地利用抑制剂和未标记的数据来训练可以利用酶类之间已知分层关系的预测模型。我们报告说,层次多标签神经网络EPP-HMCNF是解决此问题的最佳模型,优于基于K-Near的邻居相似性和其他机器学习模型。我们表明,训练期间的抑制剂信息一致地提高了预测能力,尤其是对于EPP-HMCNF。我们还表明,与随机数据拆分相比,在现实数据拆分以及与天然底物相比评估非天然底物的性能时,所有滥交预测模型在逼真的数据拆分下的表现较差。我们在https://github.com/hassounlab/epp上为EPP-HMCNF和其他模型提供了python代码。
As experimental efforts are costly and time consuming, computational characterization of enzyme capabilities is an attractive alternative. We present and evaluate several machine-learning models to predict which of 983 distinct enzymes, as defined via the Enzyme Commission, EC, numbers, are likely to interact with a given query molecule. Our data consists of enzyme-substrate interactions from the BRENDA database. Some interactions are attributed to natural selection and involve the enzyme's natural substrates. The majority of the interactions however involve non-natural substrates, thus reflecting promiscuous enzymatic activities. We frame this enzyme promiscuity prediction problem as a multi-label classification task. We maximally utilize inhibitor and unlabelled data to train prediction models that can take advantage of known hierarchical relationships between enzyme classes. We report that a hierarchical multi-label neural network, EPP-HMCNF, is the best model for solving this problem, outperforming k-nearest neighbors similarity-based and other machine learning models. We show that inhibitor information during training consistently improves predictive power, particularly for EPP-HMCNF. We also show that all promiscuity prediction models perform worse under a realistic data split when compared to a random data split, and when evaluating performance on non-natural substrates compared to natural substrates. We provide Python code for EPP-HMCNF and other models in a repository termed EPP (Enzyme Promiscuity Prediction) at https://github.com/hassounlab/EPP.