论文标题

使用纵向处方和医学主张来检测非酒精性脂肪性肝炎(NASH)的机器学习

Machine learning using longitudinal prescription and medical claims for the detection of nonalcoholic steatohepatitis (NASH)

论文作者

Yasar, Ozge, Long, Patrick, Harder, Brett, Marshall, Hanna, Bhasin, Sanjay, Lee, Suyin, Delegge, Mark, Roy, Stephanie, Doyle, Orla, Leavitt, Nadea, Rigg, John

论文摘要

开发和评估机器学习模型的目标,以检测可疑的未诊断的非酒精性脂肪性肝炎(NASH)患者进行诊断筛查和临床管理。 在这项回顾性的观察性非介入研究中,使用了1,463,089名患者的行政医学索赔数据,培训了1,463,089名患者,梯度增强的决策树,以检测具有肥胖症患者的NASH患者可能的NASH患者,2型糖尿病患者,2型糖尿病(T2DM)糖尿病(T2DM),代谢障碍(T2DM),代谢障碍和非脂肪效率(NAASH)。对模型进行了培训,以检测所有处于危险患者的NASH或在没有先前NAFL诊断的子集中(非NAFL高危患者)。使用回顾性医学索赔数据对模型进行了训练和验证,并在精确召回和接收器操作特征曲线(AUPRC,AUROCS)下使用区域进行了评估。 结果,索赔数据中NASH的6个月发生率为每1,437例高危患者1,每2,127例非NAFL处于高危患者。经过训练以检测所有高危患者NASH的模型的AUPRC为0.0107(95%CI 0.0104-0.011),AUROC为0.84。召回10%时,模型精度为4.3%,比NASH发病率高60倍。经过训练以检测非NAFL患者NASH的模型的AUPRC为0.003(95%CI 0.0029-0.0031),AUROC为0.78。召回10%时,模型精度为1%,比NASH发病率高20倍。 结论医学索赔数据中NASH的发病率低,证实了NASH诊断不足的临床实践模式。基于索赔的机器学习可能有助于检测可能的NASH患者进行诊断测试和疾病管理。

Objectives To develop and evaluate machine learning models to detect suspected undiagnosed nonalcoholic steatohepatitis (NASH) patients for diagnostic screening and clinical management. Methods In this retrospective observational noninterventional study using administrative medical claims data from 1,463,089 patients, gradient-boosted decision trees were trained to detect likely NASH patients from an at-risk patient population with a history of obesity, type 2 diabetes mellitus (T2DM), metabolic disorder, or nonalcoholic fatty liver (NAFL). Models were trained to detect likely NASH in all at-risk patients or in the subset without a prior NAFL diagnosis (non-NAFL at-risk patients). Models were trained and validated using retrospective medical claims data and assessed using area under precision recall and receiver operating characteristic curves (AUPRCs, AUROCs). Results The 6-month incidence of NASH in claims data was 1 per 1,437 at-risk patients and 1 per 2,127 non-NAFL at-risk patients. The model trained to detect NASH in all at-risk patients had an AUPRC of 0.0107 (95% CI 0.0104 - 0.011) and an AUROC of 0.84. At 10% recall, model precision was 4.3%, which is 60x above NASH incidence. The model trained to detect NASH in non-NAFL patients had an AUPRC of 0.003 (95% CI 0.0029 - 0.0031) and an AUROC of 0.78. At 10% recall, model precision was 1%, which is 20x above NASH incidence. Conclusion The low incidence of NASH in medical claims data corroborates the pattern of NASH underdiagnosis in clinical practice. Claims-based machine learning could facilitate the detection of probable NASH patients for diagnostic testing and disease management.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源