论文标题
针对全基因组拷贝数变化的全基因组关联分析的强大统计方法
A robust statistical method for Genome-wide association analysis of human copy number variation
论文作者
论文摘要
在拷贝数变化(CNV)水平上进行全基因组关联研究(GWAS)是一个很少有人涉及的领域,几乎没有统计进展,传统方法遭受了许多问题,例如批处理效应,基因组的异构性,导致低功率或高伪发现率。我们开发了一种新的强大方法,以找到与CNV在病例和对照样本之间分布不成比例的疾病风险区域,即使它们之间存在批处理效应,我们的测试公式也对这种影响也很强。我们提出了一项新的经验贝叶斯规则,以处理测试期间估计参数时的过度拟合,该规则可以扩展到模型选择领域,与传统方法相比,当有太多的潜在模型指定时,它可以更有效。我们还为我们提出的方法提供了可靠的理论保证,并通过模拟和雷达塔分析证明了有效性。
Conducting genome-wide association studies (GWAS) in copy number variation (CNV) level is a field where few people involves and little statistical progresses have been achieved, traditional methods suffer from many problems such as batch effects, heterogeneity across genome, leading to low power or high false discovery rate. We develop a new robust method to find disease-risking regions related to CNV's disproportionately distributed between case and control samples, even if there are batch effects between them, our test formula is robust to such effects. We propose a new empirical Bayes rule to deal with overfitting when estimating parameters during testing, this rule can be extended to the field of model selection, it can be more efficient compared with traditional methods when there are too much potential models to be specified. We also give solid theoretical guarantees for our proposed method, and demonstrate the effectiveness by simulation and realdata analysis.