从电子健康记录（EHR）中对患者数据的深入表示：系统评价

论文标题

从电子健康记录（EHR）中对患者数据的深入表示：系统评价

Deep Representation Learning of Patient Data from Electronic Health Records (EHR): A Systematic Review

论文作者

Si, Yuqi, Du, Jingcheng, Li, Zhao, Jiang, Xiaoqian, Miller, Timothy, Wang, Fei, Zheng, W. Jim, Roberts, Kirk

论文摘要

患者代表性学习是指学习一个从电子健康记录（EHRS）中编码有意义信息的患者的密集数学表示。通常使用先进的深度学习方法进行。这项研究对该领域进行了系统的综述，并从方法论的角度提供了定性和定量分析。我们确定了从MEDLINE，EMBASE，SCOPUS，计算机协会（ACM）数字图书馆和电气和电子工程师协会（IEEEE）Xplore Digital Library的研究中，通过使用MEDLINE，EMBASE，SCOPUS，SCOPUS，SCOPUS，SCOPUS，SCOPUS，SCOPUS，SCOPUS，SCOPUS，SCOPUS，SCOPUS，SCOPUS，SCOPUS（IEEE）XPLORE DIGITIOL的研究来开发患者表示的研究。筛选363篇文章后，包括49篇论文以进行全面的数据收集。我们注意到一个典型的工作流程，从喂食原始数据，应用深度学习模型开始，并以临床结果预测作为对学习表示的评估结束。具体而言，来自结构化EHR数据的学习表示形式是主要的（49个研究中有37个）。复发性神经网络被广泛应用于深度学习结构（LSTM：13研究，GRU：11研究）。疾病预测是最常见的应用和评估（31项研究）。由于EHR数据的隐私问题，基准数据集大多不可用（28项研究），并且在20项研究中确保了代码可用性。我们通过系统评价来展示学习患者EHR数据全面表示的重要性和可行性。患者代表性学习技术的进步对于为患者级EHR分析提供动力至关重要。未来的工作仍将致力于利用可用EHR数据的丰富性和潜力。知识蒸馏和高级学习技术将被利用，以帮助进一步学习患者代表。

Patient representation learning refers to learning a dense mathematical representation of a patient that encodes meaningful information from Electronic Health Records (EHRs). This is generally performed using advanced deep learning methods. This study presents a systematic review of this field and provides both qualitative and quantitative analyses from a methodological perspective. We identified studies developing patient representations from EHRs with deep learning methods from MEDLINE, EMBASE, Scopus, the Association for Computing Machinery (ACM) Digital Library, and Institute of Electrical and Electronics Engineers (IEEE) Xplore Digital Library. After screening 363 articles, 49 papers were included for a comprehensive data collection. We noticed a typical workflow starting with feeding raw data, applying deep learning models, and ending with clinical outcome predictions as evaluations of the learned representations. Specifically, learning representations from structured EHR data was dominant (37 out of 49 studies). Recurrent Neural Networks were widely applied as the deep learning architecture (LSTM: 13 studies, GRU: 11 studies). Disease prediction was the most common application and evaluation (31 studies). Benchmark datasets were mostly unavailable (28 studies) due to privacy concerns of EHR data, and code availability was assured in 20 studies. We show the importance and feasibility of learning comprehensive representations of patient EHR data through a systematic review. Advances in patient representation learning techniques will be essential for powering patient-level EHR analyses. Future work will still be devoted to leveraging the richness and potential of available EHR data. Knowledge distillation and advanced learning techniques will be exploited to assist the capability of learning patient representation further.

下载PDF全文

下载文献需遵守相关版权规定

论文标题