论文标题

部分可观测时空混沌系统的无模型预测

Instrumental Variables in Causal Inference and Machine Learning: A Survey

论文作者

Wu, Anpeng, Kuang, Kun, Xiong, Ruoxuan, Wu, Fei

论文摘要

因果推断是使用假设,研究设计和估计策略来得出有关基于数据变量之间因果关系的结论的过程。这使研究人员能够更好地了解复杂系统中工作中的基本机制,并做出更明智的决定。在许多情况下,我们可能不会完全观察所有影响治疗和结果变量的混杂因素,从而使因果效应的估计复杂化。为了解决这个问题,因果推理和机器学习的越来越多的文献建议使用工具变量(IV)。本文是系统,全面地介绍和讨论IV方法及其在因果推理和机器学习中的应用。首先,我们提供了IVS的形式定义,并讨论了不同假设下的IV回归方法的识别问题。其次,我们根据对所提出的方法的重点将现有的IV方法上的现有工作分为三个流,包括具有IVS的两个阶段最小二乘,IVS的控制功能以及对IVS的评估。对于每个流,我们介绍了经典的因果推理方法,也介绍了机器学习文献中的最新发展。然后,我们在实际情况下介绍了IV方法的各种应用程序,并提供了可用数据集和算法的摘要。最后,我们总结了文献,讨论开放问题,并建议对IV方法及其应用的有希望的未来研究方向。我们还开发了在本调查中回顾的IVS方法的工具包,网址为https://github.com/causal-machine-learning-lab/mliv。

Causal inference is the process of using assumptions, study designs, and estimation strategies to draw conclusions about the causal relationships between variables based on data. This allows researchers to better understand the underlying mechanisms at work in complex systems and make more informed decisions. In many settings, we may not fully observe all the confounders that affect both the treatment and outcome variables, complicating the estimation of causal effects. To address this problem, a growing literature in both causal inference and machine learning proposes to use Instrumental Variables (IV). This paper serves as the first effort to systematically and comprehensively introduce and discuss the IV methods and their applications in both causal inference and machine learning. First, we provide the formal definition of IVs and discuss the identification problem of IV regression methods under different assumptions. Second, we categorize the existing work on IV methods into three streams according to the focus on the proposed methods, including two-stage least squares with IVs, control function with IVs, and evaluation of IVs. For each stream, we present both the classical causal inference methods, and recent developments in the machine learning literature. Then, we introduce a variety of applications of IV methods in real-world scenarios and provide a summary of the available datasets and algorithms. Finally, we summarize the literature, discuss the open problems and suggest promising future research directions for IV methods and their applications. We also develop a toolkit of IVs methods reviewed in this survey at https://github.com/causal-machine-learning-lab/mliv.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源