交叉拟合和平均机器学习估计异质治疗效果

论文标题

交叉拟合和平均机器学习估计异质治疗效果

Cross-Fitting and Averaging for Machine Learning Estimation of Heterogeneous Treatment Effects

论文作者

Jacob, Daniel

论文摘要

我们研究了样品分裂，交叉拟合和平均条件平均治疗效果的有限样本性能。最近提出的方法，即所谓的元学习者，利用机器学习来估计不同的滋扰功能，因此对数据的基础结构的限制更少。为了限制使用机器学习方法时可能导致的潜在过度拟合偏差，已经提出了交叉估计器。这包括将数据分开以不同的折叠，以减少偏差和平均折叠以恢复效率。据我们所知，尚不清楚如何确切地分配和平均数据。我们采用了一项具有不同数据生成过程的蒙特卡洛研究，并考虑了12个不同的估计量，这些估计值在样本分解，交叉拟合和平均程序方面有所不同。我们在四个不同的元学习者上独立研究每个估计量的性能：双重射击学习者，R-LEARNER，T-LEARNER和X-LEARNER。我们发现所有元学习者的性能在很大程度上取决于分裂和平均的过程。在应用交叉拟合时，可以在样本拆分估计器中的平均平方误差（MSE）方面的最佳性能以及在多个不同的样品分解迭代上取上中位数。当ML方法中包含LASSO时，一些元学习者表现出很大的差异。不包括拉索会降低差异，并导致稳健和至少竞争性的结果。

We investigate the finite sample performance of sample splitting, cross-fitting and averaging for the estimation of the conditional average treatment effect. Recently proposed methods, so-called meta-learners, make use of machine learning to estimate different nuisance functions and hence allow for fewer restrictions on the underlying structure of the data. To limit a potential overfitting bias that may result when using machine learning methods, cross-fitting estimators have been proposed. This includes the splitting of the data in different folds to reduce bias and averaging over folds to restore efficiency. To the best of our knowledge, it is not yet clear how exactly the data should be split and averaged. We employ a Monte Carlo study with different data generation processes and consider twelve different estimators that vary in sample-splitting, cross-fitting and averaging procedures. We investigate the performance of each estimator independently on four different meta-learners: the doubly-robust-learner, R-learner, T-learner and X-learner. We find that the performance of all meta-learners heavily depends on the procedure of splitting and averaging. The best performance in terms of mean squared error (MSE) among the sample split estimators can be achieved when applying cross-fitting plus taking the median over multiple different sample-splitting iterations. Some meta-learners exhibit a high variance when the lasso is included in the ML methods. Excluding the lasso decreases the variance and leads to robust and at least competitive results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题