论文标题
在实时高频数据的存在下,通过随机二次采样的存在重复事件分析
Recurrent event analysis in the presence of real-time high frequency data via random subsampling
论文作者
论文摘要
数字监测研究通过受试者自然环境中的移动传感器收集实时高频数据。该数据可用于模拟生理变化对复发事件结果的影响,例如吸烟,吸毒,饮酒或自杀念头的自我识别时刻。然而,在这种情况下,反复事件分析的似然计算变得过于计算。在此激励的情况下,提出了一个随机的子采样框架,以用于计算高效,基于近似可能性的估计。累积危害的衍生物的次采样估计量进入对数似然的近似值。估计器具有两个变化来源:第一个是由于复发事件模型,第二个是由于子采样引起的。可以通过增加采样率来降低后者;但是,这导致计算成本增加。近似分数方程等于逻辑回归评分方程,允许使用标准的“现成”软件来拟合这些模型。模拟证明了方法和效率计算的权衡。最后,我们使用来自自杀构想的数字监测研究的数据来说明我们的方法。
Digital monitoring studies collect real-time high frequency data via mobile sensors in the subjects' natural environment. This data can be used to model the impact of changes in physiology on recurrent event outcomes such as smoking, drug use, alcohol use, or self-identified moments of suicide ideation. Likelihood calculations for the recurrent event analysis, however, become computationally prohibitive in this setting. Motivated by this, a random subsampling framework is proposed for computationally efficient, approximate likelihood-based estimation. A subsampling-unbiased estimator for the derivative of the cumulative hazard enters into an approximation of log-likelihood. The estimator has two sources of variation: the first due to the recurrent event model and the second due to subsampling. The latter can be reduced by increasing the sampling rate; however, this leads to increased computational costs. The approximate score equations are equivalent to logistic regression score equations, allowing for standard, "off-the-shelf" software to be used in fitting these models. Simulations demonstrate the method and efficiency-computation trade-off. We end by illustrating our approach using data from a digital monitoring study of suicidal ideation.