论文标题
胸部X射线图像的肺癌恶性模型的深层开采产生
Deep Mining Generation of Lung Cancer Malignancy Models from Chest X-ray Images
论文作者
论文摘要
肺癌是全球癌症死亡和发病率的主要原因。许多研究表明,机器学习模型可有效检测胸部X射线图像的肺结节。但是,由于深度学习模型的黑盒本质所产生的几种实用,道德和法规限制,医学界尚未接受这些技术。此外,胸部X射线上可见的大多数肺结节都是良性的。因此,基于计算机视觉的肺结核检测的狭窄任务不能等同于自动肺癌检测。解决这两个问题的问题,这项研究介绍了一种新型的混合深度学习和基于决策树的计算机视觉模型,该模型将肺癌恶性肿瘤预测作为可解释的决策树。该过程的深度学习组成部分是使用有关与肺癌相关的病理生物标志物的大型公开数据集训练的。然后将这些模型用于推理来自两个独立数据集的胸部X射线图像的生物标志物分数,可为其提供恶性元数据。我们通过将浅决策树拟合到恶性分层数据集并询问一系列指标来确定最佳模型来挖掘多变量预测模型。我们最好的决策树模型可达到86.7%和80.0%的敏感性和特异性,正预测值为92.9%。使用这种方法开采的决策树可以被视为将精炼成临床上有用的多变量肺癌恶性模型的起点,以实施作为提高人类放射科医生效率的工作流量增大工具。
Lung cancer is the leading cause of cancer death and morbidity worldwide. Many studies have shown machine learning models to be effective at detecting lung nodules from chest X-ray images. However, these techniques have yet to be embraced by the medical community due to several practical, ethical, and regulatory constraints stemming from the black-box nature of deep learning models. Additionally, most lung nodules visible on chest X-ray are benign; therefore, the narrow task of computer vision-based lung nodule detection cannot be equated to automated lung cancer detection. Addressing both concerns, this study introduces a novel hybrid deep learning and decision tree-based computer vision model which presents lung cancer malignancy predictions as interpretable decision trees. The deep learning component of this process is trained using a large publicly available dataset on pathological biomarkers associated with lung cancer. These models are then used to inference biomarker scores for chest X-ray images from two, independent data sets for which malignancy metadata is available. We mine multi-variate predictive models by fitting shallow decision trees to the malignancy stratified datasets and interrogate a range of metrics to determine the best model. Our best decision tree model achieves sensitivity and specificity of 86.7% and 80.0% respectively with a positive predictive value of 92.9%. Decision trees mined using this method may be considered as a starting point for refinement into clinically useful multi-variate lung cancer malignancy models for implementation as a workflow augmentation tool to improve the efficiency of human radiologists.