使用多模式电子健康记录的疾病风险预测的依赖性注意模型

论文标题

使用多模式电子健康记录的疾病风险预测的依赖性注意模型

Label Dependent Attention Model for Disease Risk Prediction Using Multimodal Electronic Health Records

论文作者

Niu, Shuai, Yin, Qing, Song, Yunya, Guo, Yike, Yang, Xian

论文摘要

疾病风险预测引起了现代医疗保健领域的越来越多的关注，尤其是在人工智能（AI）的最新进展中。包含异构患者信息的电子健康记录（EHR）广泛用于疾病风险预测任务。应用AI模型进行风险预测的一个挑战在于生成可解释的证据以支持预测结果，同时保留预测能力。为了解决这个问题，我们提出了一种共同嵌入单词和标签的方法，其中关注模块根据医学注释与风险预测标签的名称的相关性从医学笔记中学习了单词的权重。这种方法通过采用注意机制并包括模型中预测任务的名称来提高可解释性。但是，其应用仅限于处理文本输入（例如医疗说明）。在本文中，我们提出了一个依赖性注意模型LDAM至1）通过利用临床 - 伯特（预先在大型临床语料库中培训的生物医学语言模型）来提高可解释性，以共同编码生物医学上有意义的特征和标签。 2）将联合嵌入的想法扩展到时间序列数据的处理，并开发一个多模式学习框架，以从医疗注释和时间序列健康状况指标中整合异质信息。为了证明我们的方法，我们将LDAM应用于模拟III数据集以预测不同的疾病风险。我们在定量和定性上评估我们的方法。具体而言，将显示LDAM的预测能力，并将进行案例研究以说明其可解释性。

Disease risk prediction has attracted increasing attention in the field of modern healthcare, especially with the latest advances in artificial intelligence (AI). Electronic health records (EHRs), which contain heterogeneous patient information, are widely used in disease risk prediction tasks. One challenge of applying AI models for risk prediction lies in generating interpretable evidence to support the prediction results while retaining the prediction ability. In order to address this problem, we propose the method of jointly embedding words and labels whereby attention modules learn the weights of words from medical notes according to their relevance to the names of risk prediction labels. This approach boosts interpretability by employing an attention mechanism and including the names of prediction tasks in the model. However, its application is only limited to the handling of textual inputs such as medical notes. In this paper, we propose a label dependent attention model LDAM to 1) improve the interpretability by exploiting Clinical-BERT (a biomedical language model pre-trained on a large clinical corpus) to encode biomedically meaningful features and labels jointly; 2) extend the idea of joint embedding to the processing of time-series data, and develop a multi-modal learning framework for integrating heterogeneous information from medical notes and time-series health status indicators. To demonstrate our method, we apply LDAM to the MIMIC-III dataset to predict different disease risks. We evaluate our method both quantitatively and qualitatively. Specifically, the predictive power of LDAM will be shown, and case studies will be carried out to illustrate its interpretability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题