多研究预测的交叉验证方法

论文标题

多研究预测的交叉验证方法

Cross-validation Approaches for Multi-study Predictions

论文作者

Ren, Boyu, Patil, Prasad, Dominici, Francesca, Parmigiani, Giovanni, Trippa, Lorenzo

论文摘要

我们考虑在预测因子和结果之间存在潜在差异的多项研究中的预测。我们的目标是整合来自多项研究的数据，以开发未见研究的预测模型。我们提出并研究了一种适用于多种研究堆叠的交叉验证方法，这是一种合奏方法，该方法线性结合了研究特定的集合成员以产生可概括的预测。在我们的交叉验证方法中，有些是避免在培训和堆叠步骤中重复使用相同数据，如较早的多学生堆叠所做的那样。我们证明，在轻度的规律条件下，提出的交叉验证方法产生了具有Oracle特性的堆叠预测函数。我们还可以通过分析确定在哪种情况下，与数据重复使用相比，提出的交叉验证方法提高了预测准确性。我们进行仿真研究以说明这些结果。最后，我们将方法应用于使用数据集的收集来预测长期暴露于空气污染物的死亡率。

We consider prediction in multiple studies with potential differences in the relationships between predictors and outcomes. Our objective is to integrate data from multiple studies to develop prediction models for unseen studies. We propose and investigate two cross-validation approaches applicable to multi-study stacking, an ensemble method that linearly combines study-specific ensemble members to produce generalizable predictions. Among our cross-validation approaches are some that avoid reuse of the same data in both the training and stacking steps, as done in earlier multi-study stacking. We prove that under mild regularity conditions the proposed cross-validation approaches produce stacked prediction functions with oracle properties. We also identify analytically in which scenarios the proposed cross-validation approaches increase prediction accuracy compared to stacking with data reuse. We perform a simulation study to illustrate these results. Finally, we apply our method to predicting mortality from long-term exposure to air pollutants, using collections of datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题