对电子健康记录的深度表示学习以大规模解锁患者分层

论文标题

对电子健康记录的深度表示学习以大规模解锁患者分层

Deep Representation Learning of Electronic Health Records to Unlock Patient Stratification at Scale

论文作者

Landi, Isotta, Glicksberg, Benjamin S., Lee, Hao-Chih, Cherng, Sarah, Landi, Giulia, Danieletto, Matteo, Dudley, Joel T., Furlanello, Cesare, Miotto, Riccardo

论文摘要

从电子健康记录（EHR）中得出疾病亚型可以指导下一代个性化医学。但是，总结和代表患者数据的挑战可以阻止基于EHR的分层分析的广泛实践。在这里，我们提出了一个基于深度学习的无监督框架，以处理异质EHR并得出患者表示，可以有效地有效地使患者分层。我们考虑了来自多元化医院队列的1,608,741例患者的EHR，其中包括57,464个临床概念。我们介绍了一个基于单词嵌入，卷积神经网络和自动编码器（即crevae）的表示模型，以将患者轨迹转化为低维的潜在载体。我们通过将分层聚类应用于不同的多疾病和疾病特异性患者同类群，将这些表示形式评估为广泛促进患者分层。在聚类任务中，汇款显着优于几个基线，以识别具有不同复杂条件的患者，熵为2.61，纯度平均得分为0.31。当应用于在特定状况下对患者进行分层时，CORVE导致了不同疾病的各种临床相关亚型，包括2型糖尿病，帕金森氏病和阿尔茨海默氏病，主要与合并症，疾病进展和症状严重程度有关。通过这些结果，我们证明了Chrea可以产生患者表示，从而导致临床意义上的见解。这种可扩展的框架可以帮助更好地理解异质亚人群中的不同病因，并解锁了个性化医学领域的基于EHR的研究。

Deriving disease subtypes from electronic health records (EHRs) can guide next-generation personalized medicine. However, challenges in summarizing and representing patient data prevent widespread practice of scalable EHR-based stratification analysis. Here we present an unsupervised framework based on deep learning to process heterogeneous EHRs and derive patient representations that can efficiently and effectively enable patient stratification at scale. We considered EHRs of 1,608,741 patients from a diverse hospital cohort comprising of a total of 57,464 clinical concepts. We introduce a representation learning model based on word embeddings, convolutional neural networks, and autoencoders (i.e., ConvAE) to transform patient trajectories into low-dimensional latent vectors. We evaluated these representations as broadly enabling patient stratification by applying hierarchical clustering to different multi-disease and disease-specific patient cohorts. ConvAE significantly outperformed several baselines in a clustering task to identify patients with different complex conditions, with 2.61 entropy and 0.31 purity average scores. When applied to stratify patients within a certain condition, ConvAE led to various clinically relevant subtypes for different disorders, including type 2 diabetes, Parkinson's disease and Alzheimer's disease, largely related to comorbidities, disease progression, and symptom severity. With these results, we demonstrate that ConvAE can generate patient representations that lead to clinically meaningful insights. This scalable framework can help better understand varying etiologies in heterogeneous sub-populations and unlock patterns for EHR-based research in the realm of personalized medicine.

下载PDF全文

下载文献需遵守相关版权规定

论文标题