论文标题
从EHRS中利用多元时间序列中提供信息的内核
A Kernel to Exploit Informative Missingness in Multivariate Time Series from EHRs
论文作者
论文摘要
电子健康记录(EHRS)的很大一部分包括随着时间的推移收集的临床测量值,例如实验室测试和生命体征,这些测试提供了有关患者健康状况的重要信息。这些临床测量序列自然表示为时间序列,其特征是多个变量和大量缺失数据,这使分析变得复杂。在这项工作中,我们提出了一个新颖的内核,该内核能够利用观测值中的信息以及隐藏在多元时间序列中缺失模式(MTS)中的信息,例如来自EHRS。内核,称为TCK $ _ {im} $,是使用合奏学习策略设计的,在该策略中,基本模型是新型混合模式贝叶斯混合模型,可以有效利用信息性的缺失而无需诉诸插入方法。此外,整体方法可确保对超参数的鲁棒性,因此,如果缺乏标签,TCK $ _ {im} $特别适合 - 在医疗应用中是已知的挑战。三个现实世界临床数据集的实验证明了所提出的内核的有效性。
A large fraction of the electronic health records (EHRs) consists of clinical measurements collected over time, such as lab tests and vital signs, which provide important information about a patient's health status. These sequences of clinical measurements are naturally represented as time series, characterized by multiple variables and large amounts of missing data, which complicate the analysis. In this work, we propose a novel kernel which is capable of exploiting both the information from the observed values as well the information hidden in the missing patterns in multivariate time series (MTS) originating e.g. from EHRs. The kernel, called TCK$_{IM}$, is designed using an ensemble learning strategy in which the base models are novel mixed mode Bayesian mixture models which can effectively exploit informative missingness without having to resort to imputation methods. Moreover, the ensemble approach ensures robustness to hyperparameters and therefore TCK$_{IM}$ is particularly well suited if there is a lack of labels - a known challenge in medical applications. Experiments on three real-world clinical datasets demonstrate the effectiveness of the proposed kernel.