论文标题
稳定的与健康相关的反疗法预测任务的预测受到选择偏见的影响:需要解除测试集功能
Stable predictions for health related anticausal prediction tasks affected by selection biases: the need to deconfound the test set features
论文作者
论文摘要
在与健康相关的机器学习应用程序中,培训数据通常与目标人群中的非代表性样本相对应。在反疗法预测任务中,选择偏见通常会使混杂因素与结果变量之间的关联在不同的目标环境中不稳定。结果,混杂的学习者的预测通常是不稳定的,并且可能无法在转移的测试环境中概括。稳定的预测方法旨在通过在未知的测试环境中产生稳定的预测来解决此问题。但是,这些方法有时仅应用于训练数据,希望训练不符的模型足以在转移的测试集中产生稳定的预测。在这里,我们表明这是不足的,并且可以通过解开测试集功能来实现改善的稳定性。我们使用移动健康研究中的合成数据和现实世界数据来说明这些观察结果。
In health related machine learning applications, the training data often corresponds to a non-representative sample from the target populations where the learners will be deployed. In anticausal prediction tasks, selection biases often make the associations between confounders and the outcome variable unstable across different target environments. As a consequence, the predictions from confounded learners are often unstable, and might fail to generalize in shifted test environments. Stable prediction approaches aim to solve this problem by producing predictions that are stable across unknown test environments. These approaches, however, are sometimes applied to the training data alone with the hope that training an unconfounded model will be enough to generate stable predictions in shifted test sets. Here, we show that this is insufficient, and that improved stability can be achieved by deconfounding the test set features as well. We illustrate these observations using both synthetic data and real world data from a mobile health study.