论文标题

从EHR数据发现的一种新颖的因果结构发现,这是关于2型糖尿病的演示

A novel method for Causal Structure Discovery from EHR data, a demonstration on type-2 diabetes mellitus

论文作者

Shen, Xinpeng, Ma, Sisi, Vemuri, Prashanthi, Castro, M. Regina, Caraballo, Pedro J., Simon, Gyorgy J.

论文摘要

简介:发现基本疾病的因果机制可以更好地诊断,预后和治疗选择。临床试验一直是确定因果关系的黄金标准,但它们是资源密集的,有时是不可行的或不道德的。电子健康记录(EHR)包含大量现实数据,这些数据有望发现疾病机制,但是由于EHR数据的特殊特征,现有的因果结构发现(CSD)方法缺乏利用它们。我们提出了一种新的数据转换方法和一种新型的CSD算法来克服这些特征所带来的挑战。材料和方法:我们证明了针对2型糖尿病的应用所提出的方法。我们使用了从Mayo Clinic的大型EHR数据集来内部评估所提出的转型和CSD方法,并使用了独立卫生系统Fairview Health Services的另一个大数据集作为外部验证。我们将我们提出的方法的性能与快速贪婪等效搜索(FGE)进行了比较,这是一种最先进的CSD方法,就正确性,稳定性和完整性而言。我们通过外部验证测试了所提出的算法的推广性。结果和结论:通过成功纳入研究设计注意事项,提出的方法改善了现有方法,面对不可靠的EHR时间戳和推断的因果效应方向是强大的,更正确,可靠地推断出来。提出的数据转换成功地改善了发现图的临床正确性以及跨自举样品的边缘方向的一致性。它带来了卓越的准确性,稳定性和完整性。

Introduction: The discovery of causal mechanisms underlying diseases enables better diagnosis, prognosis and treatment selection. Clinical trials have been the gold standard for determining causality, but they are resource intensive, sometimes infeasible or unethical. Electronic Health Records (EHR) contain a wealth of real-world data that holds promise for the discovery of disease mechanisms, yet the existing causal structure discovery (CSD) methods fall short on leveraging them due to the special characteristics of the EHR data. We propose a new data transformation method and a novel CSD algorithm to overcome the challenges posed by these characteristics. Materials and methods: We demonstrated the proposed methods on an application to type-2 diabetes mellitus. We used a large EHR data set from Mayo Clinic to internally evaluate the proposed transformation and CSD methods and used another large data set from an independent health system, Fairview Health Services, as external validation. We compared the performance of our proposed method to Fast Greedy Equivalence Search (FGES), a state-of-the-art CSD method in terms of correctness, stability and completeness. We tested the generalizability of the proposed algorithm through external validation. Results and conclusions: The proposed method improved over the existing methods by successfully incorporating study design considerations, was robust in face of unreliable EHR timestamps and inferred causal effect directions more correctly and reliably. The proposed data transformation successfully improved the clinical correctness of the discovered graph and the consistency of edge orientation across bootstrap samples. It resulted in superior accuracy, stability, and completeness.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源