论文标题
Bregman基于分歧的数据集成与多基因风险评分(PRS)异质性调整
Bregman Divergence-Based Data Integration with Application to Polygenic Risk Score (PRS) Heterogeneity Adjustment
论文作者
论文摘要
多基因风险评分(PR)最近对遗传学风险预测受到了很多关注。尽管对高加索人口的成功,但基于少数族裔人口的PR却遭受了较小的样本大小,高维度和低信噪比的比例,加剧已经严重的健康差异。由于人口异质性,通过利用高加索模型为少数族裔人口而直接的跨种族预测也有限。此外,由于数据隐私,高加索人群或少数人口都无法访问单个基因型数据。为了应对这些挑战,我们提出了一个基于布雷格曼分歧的估计程序,以衡量和最佳平衡来自不同人群的信息。提出的方法仅需要使用加密的摘要统计数据,并通过合并其他信息来改善少数民族群体的PRS绩效。我们为提出的方法提供渐近一致性和弱甲骨文属性。仿真和实际数据分析还显示了其在预测和可变选择方面的优势。
Polygenic risk scores (PRS) have recently received much attention for genetics risk prediction. While successful for the Caucasian population, the PRS based on the minority population suffer from small sample sizes, high dimensionality and low signal-to-noise ratios, exacerbating already severe health disparities. Due to population heterogeneity, direct trans-ethnic prediction by utilizing the Caucasian model for the minority population also has limited performance. In addition, due to data privacy, the individual genotype data is not accessible for either the Caucasian population or the minority population. To address these challenges, we propose a Bregman divergence-based estimation procedure to measure and optimally balance the information from different populations. The proposed method only requires the use of encrypted summary statistics and improves the PRS performance for ethnic minority groups by incorporating additional information. We provide the asymptotic consistency and weak oracle property for the proposed method. Simulations and real data analyses also show its advantages in prediction and variable selection.