论文标题
通过机器学习和生物信息学对胰腺癌进行准确且负担得起的早期诊断的新颖工具
A Novel Tool for the Accurate and Affordable Early Diagnosis of Pancreatic Cancer via Machine Learning and Bioinformatics
论文作者
论文摘要
胰腺癌(PC)由于其五年生存率为10%,是美国癌症死亡的第四个主要原因。晚期诊断,与早期阶段无症状的性质和癌症相对于胰腺的位置有关联性,这使得当前被广泛认可的筛查方法无法使用。先前的研究已达到较低(70-75%)的诊断准确性,这可能是因为80%的PC病例与糖尿病有关,导致误诊。为了解决频繁的晚期诊断和误诊的问题,我们通过分析PC和糖尿病中的19个基因的表达来开发出一种可访问,准确和负担得起的PC的诊断工具。首先,根据PC和糖尿病的发生,对机器学习算法进行了四组受试者的培训。在不同阶段对400名受试者分析模型,以确保有效性。天真的贝叶斯,神经网络和K-Nearest邻居模型的测试准确性约为92.6%。其次,使用生物信息学工具研究了19个基因的生物学意义。发现这些基因显着参与胰腺中的细胞质,细胞骨架和核受体活性,特别是在腺泡和导管细胞中。我们的新工具是文献中第一个达到90%以上的PC诊断准确性,具有显着改善糖尿病背景中PC的检测并提高五年生存率的潜力。
Pancreatic cancer (PC) is the fourth leading cause of cancer death in the United States due to its five-year survival rate of 10%. Late diagnosis, affiliated with the asymptomatic nature in early stages and the location of the cancer with respect to the pancreas, makes current widely-accepted screening methods unavailable. Prior studies have achieved low (70-75%) diagnostic accuracy, possibly because 80% of PC cases are associated with diabetes, leading to misdiagnosis. To address the problems of frequent late diagnosis and misdiagnosis, we developed an accessible, accurate and affordable diagnostic tool for PC, by analyzing the expression of nineteen genes in PC and diabetes. First, machine learning algorithms were trained on four groups of subjects, depending on the occurrence of PC and Diabetes. The models were analyzed with 400 PC subjects at varying stages to ensure validity. Naive Bayes, Neural Network and K-Nearest Neighbors models achieved the highest testing accuracy of around 92.6%. Second, the biological implication of the nineteen genes was investigated using bioinformatics tools. It was found that these genes were significantly involved in regulating the cytoplasm, cytoskeleton and nuclear receptor activity in the pancreas, specifically in acinar and ductal cells. Our novel tool is the first in the literature that achieves a PC diagnostic accuracy of above 90%, having the potential to significantly improve the detection of PC in the background of diabetes and increase the five-year survival rate.