论文标题
因果转移随机森林:结合记录的数据和随机实验以进行健壮预测
Causal Transfer Random Forest: Combining Logged Data and Randomized Experiments for Robust Prediction
论文作者
论文摘要
对于预测模型来说,对于训练和测试数据之间的分配变化通常是至关重要的。从因果关系的角度来看,挑战是将稳定的因果关系与跨转移的不稳定的虚假相关性区分开来。我们描述了一个因果转移随机森林(CTRF),该森林(CTRF)将现有训练数据与少量数据结合在一起,从随机实验中训练模型,该模型对特征变化是可靠的,因此将其转移到新的靶向分布中。从理论上讲,我们证明了方法与因果学习的知识的特征转移的鲁棒性。从经验上讲,我们使用Bing ADS平台中的合成数据实验和现实世界实验评估CTRF,包括点击预测任务以及端到端的反事实优化系统。所提出的CTRF产生强大的预测,并且在特征移动的存在下比较了大多数基线方法。
It is often critical for prediction models to be robust to distributional shifts between training and testing data. From a causal perspective, the challenge is to distinguish the stable causal relationships from the unstable spurious correlations across shifts. We describe a causal transfer random forest (CTRF) that combines existing training data with a small amount of data from a randomized experiment to train a model which is robust to the feature shifts and therefore transfers to a new targeting distribution. Theoretically, we justify the robustness of the approach against feature shifts with the knowledge from causal learning. Empirically, we evaluate the CTRF using both synthetic data experiments and real-world experiments in the Bing Ads platform, including a click prediction task and in the context of an end-to-end counterfactual optimization system. The proposed CTRF produces robust predictions and outperforms most baseline methods compared in the presence of feature shifts.