论文标题
结果适应性倾向得分方法用于处理审查和高维度:适用保险索赔
Outcome Adaptive Propensity Score Methods for Handling Censoring and High-Dimensionality: Application to Insurance Claims
论文作者
论文摘要
倾向得分通常用于减少非随机观察性研究中的混杂偏见,以估计平均治疗效果。这种方法基本的一个重要假设是,所有与治疗和感兴趣结果都相关的混杂因素均已测量并包括在倾向得分模型中。在缺乏有关潜在混杂因素的知识的知识的情况下,研究人员可能不愿意调整一组高维的预处理变量。因此,倾向分数估计需要可变选择过程。此外,最近的研究表明,仅在倾向得分模型中仅包含与治疗相关的变量可能会膨胀治疗效应估计的方差,而包括只能预测结果的变量可以提高效率。在本文中,我们提出了一种灵活的方法,将结合结果的关系融合到倾向分数模型中,将预测的二进制结果概率(OP)作为协变量。我们的方法可以很容易地适应可变选择方法的集合,包括基于分类和回归树的现代机器学习工具。我们评估了我们在多个治疗组中估算二元结果的治疗效果的方法。仿真研究表明,合并用于估计倾向得分的OP可以提高统计效率并防止模型错误指定。所提出的方法适用于从私人保险索赔数据库中鉴定出的高级前列腺癌患者,以比较四种常用药物的不良反应,以治疗耐castration抗性的前列腺癌。
Propensity scores are commonly used to reduce the confounding bias in non-randomized observational studies for estimating the average treatment effect. An important assumption underlying this approach is that all confounders that are associated with both the treatment and the outcome of interest are measured and included in the propensity score model. In the absence of strong prior knowledge about potential confounders, researchers may agnostically want to adjust for a high-dimensional set of pre-treatment variables. As such, variable selection procedure is needed for propensity score estimation. In addition, recent studies show that including variables related to treatment only in the propensity score model may inflate the variance of the treatment effect estimates, while including variables that are predictive of only the outcome can improve efficiency. In this paper, we propose a flexible approach to incorporating outcome-covariate relationship in the propensity score model by including the predicted binary outcome probability (OP) as a covariate. Our approach can be easily adapted to an ensemble of variable selection methods, including regularization methods and modern machine learning tools based on classification and regression trees. We evaluate our method to estimate the treatment effects on a binary outcome, which is possibly censored, among multiple treatment groups. Simulation studies indicate that incorporating OP for estimating the propensity scores can improve statistical efficiency and protect against model misspecification. The proposed methods are applied to a cohort of advanced stage prostate cancer patients identified from a private insurance claims database for comparing the adverse effects of four commonly used drugs for treating castration-resistant prostate cancer.