论文标题
时间序列长度和离散化对纵向因果估计方法的影响
The Impact of Time Series Length and Discretization on Longitudinal Causal Estimation Methods
论文作者
论文摘要
随着通过可穿戴设备,社交媒体和电子健康记录收集和数字化,观察时间序列数据评估多时间干预措施的影响正在变得越来越普遍。这样的时间序列可能涉及数百或数千个不规则抽样的观测值。一种常见的分析方法是通过在应用离散时间估计方法之前首先将它们分配为序列来简化这种时间序列,以调整时间相关的混杂。在某些设置中,这种离散化导致许多时间点的序列。然而,纵向因果估计量的经验特性尚未在长序列上进行系统比较。我们比较了模拟和实际临床数据的三种代表性纵向因果估计方法。我们的模拟和分析假定马尔可夫结构,并且纵向处理/暴露是二进制值的,最多具有单个跳跃点。我们确定偏见的来源是由于时间离散的数据而产生的,并为与长序列一起工作时提供了离散数据和在方法之间选择的实用指南。此外,我们将这些估计值在实际电子健康记录数据上进行比较,评估早期治疗对威胁生命的感染并发症的患者的影响,称为败血症。
The use of observational time series data to assess the impact of multi-time point interventions is becoming increasingly common as more health and activity data are collected and digitized via wearables, social media, and electronic health records. Such time series may involve hundreds or thousands of irregularly sampled observations. One common analysis approach is to simplify such time series by first discretizing them into sequences before applying a discrete-time estimation method that adjusts for time-dependent confounding. In certain settings, this discretization results in sequences with many time points; however, the empirical properties of longitudinal causal estimators have not been systematically compared on long sequences. We compare three representative longitudinal causal estimation methods on simulated and real clinical data. Our simulations and analyses assume a Markov structure and that longitudinal treatments/exposures are binary-valued and have at most a single jump point. We identify sources of bias that arise from temporally discretizing the data and provide practical guidance for discretizing data and choosing between methods when working with long sequences. Additionally, we compare these estimators on real electronic health record data, evaluating the impact of early treatment for patients with a life-threatening complication of infection called sepsis.