论文标题
最大化效应的统计推断:确定多个研究的稳定关联
Statistical Inference for Maximin Effects: Identifying Stable Associations across Multiple Studies
论文作者
论文摘要
来自多个来源的数据的综合分析对于做出可概括的发现至关重要。在多个源人群中始终观察到的关联更有可能被推广到可能具有分布变化的目标人群。在本文中,我们使用多个高维回归的异质多源数据进行建模,并推断出最大值效应(Meinshausen,B {ü} Hlmann,AOS,AOS,43(4),1801----1830)。最大值效应提供了跨多源数据稳定关联的度量。最大的最大效应表明,变量在多个源群体中通常具有共同的效果,并且这些共享效应可能会推广到更广泛的目标人群。由于其点估计器可能具有非标准的限制分布,因此存在挑战与推断最大值效应有关。我们设计了一种新型的抽样方法来构建有效的置信区间,以实现最大值效应。提出的置信区间达到了参数长度。该抽样程序和相关的理论分析对于解决其他非标准推理问题具有独立的兴趣。使用多种环境中酵母生长的遗传数据,我们证明具有明显最大蛋白作用的遗传变异在新环境下具有可推广的效应。
Integrative analysis of data from multiple sources is critical to making generalizable discoveries. Associations that are consistently observed across multiple source populations are more likely to be generalized to target populations with possible distributional shifts. In this paper, we model the heterogeneous multi-source data with multiple high-dimensional regressions and make inferences for the maximin effect (Meinshausen, B{ü}hlmann, AoS, 43(4), 1801--1830). The maximin effect provides a measure of stable associations across multi-source data. A significant maximin effect indicates that a variable has commonly shared effects across multiple source populations, and these shared effects may be generalized to a broader set of target populations. There are challenges associated with inferring maximin effects because its point estimator can have a non-standard limiting distribution. We devise a novel sampling method to construct valid confidence intervals for maximin effects. The proposed confidence interval attains a parametric length. This sampling procedure and the related theoretical analysis are of independent interest for solving other non-standard inference problems. Using genetic data on yeast growth in multiple environments, we demonstrate that the genetic variants with significant maximin effects have generalizable effects under new environments.