机器学习辅助复发预测早期非小细胞肺癌患者

论文标题

机器学习辅助复发预测早期非小细胞肺癌患者

Machine Learning-Assisted Recurrence Prediction for Early-Stage Non-Small-Cell Lung Cancer Patients

论文作者

Janik, Adrianna, Torrente, Maria, Costabello, Luca, Calvo, Virginia, Walsh, Brian, Camps, Carlos, Mohamed, Sameh K., Ortega, Ana L., Nováček, Vít, Massutí, Bartomeu, Minervini, Pasquale, Campelo, M. Rosario Garcia, del Barco, Edel, Bosch-Barrera, Joaquim, Menasalvas, Ernestina, Timilsina, Mohan, Provencio, Mariano

论文摘要

背景：根据复发风险对癌症患者进行分层，可以个性化他们的护理。在这项工作中，我们提供了以下研究问题的答案：如何利用机器学习来估计早期非小细胞肺癌患者复发的可能性？方法：为了预测西班牙肺癌组数据（65.7平均年龄，24.8％的女性，75.2％）的1,387个早期（I-II），非小细胞肺癌（NSCLC）患者的复发。我们为这种模型的预测生成自动解释。对于对表格数据进行培训的模型，我们采用了Shap局部解释来衡量每个患者特征如何有助于预测的结果。我们使用一种基于示例的方法来解释机器学习预测，该方法突出了过去的患者。结果：在表格数据上训练的机器学习模型在预测用10倍交叉验证评估的复发时，表现出76％的精度（在测试，火车和验证集中对不同的独立患者组进行了10次培训10次，该模型在这10个测试集中进行了平均培训）。在200名患者的测试集中，图形机学习达到了68％的精度，并在100名患者的持有套装中校准。结论：我们的结果表明，经过表格和图形数据训练的机器学习模型可以实现复发的客观，个性化和可重复的预测，因此，早期NSCLC患者的疾病结果。通过进一步的前瞻性和多站点验证，以及其他放射学和分子数据，该预后模型有可能作为确定在早期肺癌中使用辅助治疗的预测决策支持工具。关键字：非小细胞肺癌，肿瘤复发预测，机器学习

Background: Stratifying cancer patients according to risk of relapse can personalize their care. In this work, we provide an answer to the following research question: How to utilize machine learning to estimate probability of relapse in early-stage non-small-cell lung cancer patients? Methods: For predicting relapse in 1,387 early-stage (I-II), non-small-cell lung cancer (NSCLC) patients from the Spanish Lung Cancer Group data (65.7 average age, 24.8% females, 75.2% males) we train tabular and graph machine learning models. We generate automatic explanations for the predictions of such models. For models trained on tabular data, we adopt SHAP local explanations to gauge how each patient feature contributes to the predicted outcome. We explain graph machine learning predictions with an example-based method that highlights influential past patients. Results: Machine learning models trained on tabular data exhibit a 76% accuracy for the Random Forest model at predicting relapse evaluated with a 10-fold cross-validation (model was trained 10 times with different independent sets of patients in test, train and validation sets, the reported metrics are averaged over these 10 test sets). Graph machine learning reaches 68% accuracy over a 200-patient, held-out test set, calibrated on a held-out set of 100 patients. Conclusions: Our results show that machine learning models trained on tabular and graph data can enable objective, personalised and reproducible prediction of relapse and therefore, disease outcome in patients with early-stage NSCLC. With further prospective and multisite validation, and additional radiological and molecular data, this prognostic model could potentially serve as a predictive decision support tool for deciding the use of adjuvant treatments in early-stage lung cancer. Keywords: Non-Small-Cell Lung Cancer, Tumor Recurrence Prediction, Machine Learning

下载PDF全文

下载文献需遵守相关版权规定

论文标题