论文标题

数据驱动的自动对照估计(舞蹈):搜索,验证和因果关系负面对照

Data-driven Automated Negative Control Estimation (DANCE): Search for, Validation of, and Causal Inference with Negative Controls

论文作者

Kummerfeld, Erich, Lim, Jaewon, Shi, Xu

论文摘要

负面对照变量越来越多地用于调整使用观察数据的因果推断中未衡量的混杂偏差。它们通常是通过主题知识来确定的,目前严重缺乏数据驱动的方法来查找负面对照。在本文中,我们提出了一个统计检验,用于发现一种特殊类型的负面对照 - 断开的负面对照 - 可以用作未衡量的混杂因素的替代物,并且我们将该测试纳入数据驱动的自动化负面对照估计(舞蹈)算法中。舞蹈首先使用新的验证测试来识别一组满足断开负面对照的假设的候选负面对照变量的子集。然后,它将负面控制方法应用于每对经过验证的负面对照变量,并汇总输出以产生无偏见的点估计和置信区间,以在未衡量的混淆存在下产生因果效应。我们(1)证明了这种验证测试的正确性,因此证明了舞蹈的正确性; (2)通过仿真实验证明,舞蹈的表现优于天真的分析,忽略了未衡量的混杂和负面对照方法,而随机选择的候选人负面对照。 (3)展示舞蹈对充满挑战的现实问题的有效性。

Negative control variables are increasingly used to adjust for unmeasured confounding bias in causal inference using observational data. They are typically identified by subject matter knowledge and there is currently a severe lack of data-driven methods to find negative controls. In this paper, we present a statistical test for discovering negative controls of a special type -- disconnected negative controls -- that can serve as surrogates of the unmeasured confounder, and we incorporate that test into the Data-driven Automated Negative Control Estimation (DANCE) algorithm. DANCE first uses the new validation test to identify subsets of a set of candidate negative control variables that satisfy the assumptions of disconnected negative controls. It then applies a negative control method to each pair of these validated negative control variables, and aggregates the output to produce an unbiased point estimate and confidence interval for a causal effect in the presence of unmeasured confounding. We (1) prove the correctness of this validation test, and thus of DANCE; (2) demonstrate via simulation experiments that DANCE outperforms both naive analysis ignoring unmeasured confounding and negative control method with randomly selected candidate negative controls; and (3) demonstrate the effectiveness of DANCE on a challenging real-world problem.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源