机器学习模型可预测胆汁盐出口泵的抑制

论文标题

机器学习模型可预测胆汁盐出口泵的抑制

Machine Learning Models to Predict Inhibition of the Bile Salt Export Pump

论文作者

McLoughlin, Kevin S., Jeong, Claire G., Sweitzer, Thomas D., Minnich, Amanda J., Tse, Margaret J., Bennion, Brian J., Allen, Jonathan E., Calad-Thomson, Stacie, Rush, Thomas S., Brase, James M.

论文摘要

药物诱导的肝损伤（DILI）是急性肝衰竭的最常见原因，也是临床前和临床测试期间戒断候选药物的经常原因。一种重要的DILI类型是胆汁淤积性肝损伤，是由肝细胞内的胆汁盐堆积引起的。它经常与抑制胆汁盐转运蛋白的抑制有关，例如胆汁盐出口泵（BSEP）。在计算机模型中可靠地预测BSEP直接从化学结构中抑制，将显着降低药物发现期间的成本，并有助于避免对患者受伤。不幸的是，迄今为止发布的模型不足以鼓励广泛采用。我们报告了BSEP抑制的分类和回归模型的开发，其性能大大提高了先前发布的模型。我们的模型开发利用了原子联盟开发的原子建模管道（AMPL），这使我们能够训练和评估数千种候选模型。在模型开发过程中，我们评估了各种化学特征，数据集分区和类标记方案，并确定了那些最能推广到新型化学实体的生产模型。我们最佳性能分类模型是一个神经网络，在我们的内部测试数据集上具有ROC AUC = 0.88，在独立的外部化合物集上进行了0.89。我们的最佳回归模型是预测BSEP IC50s的第一个报道，得出了一个测试集$ r^2 = 0.56 $和平均绝对误差0.37，对应于预测的IC50中平均2.3倍误差，与实验变化相当。因此，这些模型将作为对DILI的机理预测的投入和作为药物发现计算管道的一部分的投入。

Drug-induced liver injury (DILI) is the most common cause of acute liver failure and a frequent reason for withdrawal of candidate drugs during preclinical and clinical testing. An important type of DILI is cholestatic liver injury, caused by buildup of bile salts within hepatocytes; it is frequently associated with inhibition of bile salt transporters, such as the bile salt export pump (BSEP). Reliable in silico models to predict BSEP inhibition directly from chemical structures would significantly reduce costs during drug discovery and could help avoid injury to patients. Unfortunately, models published to date have been insufficiently accurate to encourage wide adoption. We report our development of classification and regression models for BSEP inhibition with substantially improved performance over previously published models. Our model development leveraged the ATOM Modeling PipeLine (AMPL) developed by the ATOM Consortium, which enabled us to train and evaluate thousands of candidate models. In the course of model development, we assessed a variety of schemes for chemical featurization, dataset partitioning and class labeling, and identified those producing models that generalized best to novel chemical entities. Our best performing classification model was a neural network with ROC AUC = 0.88 on our internal test dataset and 0.89 on an independent external compound set. Our best regression model, the first ever reported for predicting BSEP IC50s, yielded a test set $R^2 = 0.56$ and mean absolute error 0.37, corresponding to a mean 2.3-fold error in predicted IC50s, comparable to experimental variation. These models will thus be useful as inputs to mechanistic predictions of DILI and as part of computational pipelines for drug discovery.

下载PDF全文

下载文献需遵守相关版权规定

论文标题