论文标题

测量高维度多共线性的严重程度

Measuring the severity of multi-collinearity in high dimensions

论文作者

Deng, Wei Q., Craiu, Radu V., Sun, Lei

论文摘要

多重结算是现代统计应用中的广泛现象,如果被忽略,可能会对模型选择和统计推断产生负面影响。用于“ $ n> p $”数据开发的经典工具和措施在高维度中不适用也不适用。在这里,我们提出1)可视化多共线性模式的新的个性化措施,以及随后的2)全球措施,以评估多共线性的整体负担,而无需限制观察到的数据维度。我们将这些措施应用于基因组应用,以研究具有不同祖先背景的个体的遗传变异中多共线性的模式。这些措施能够在视觉上区分过度多共线性的基因组区域,并对比不同大陆种群之间的多共线性水平。

Multi-collinearity is a wide-spread phenomenon in modern statistical applications and when ignored, can negatively impact model selection and statistical inference. Classic tools and measures that were developed for "$n>p$" data are not applicable nor interpretable in the high-dimensional regime. Here we propose 1) new individualized measures that can be used to visualize patterns of multi-collinearity, and subsequently 2) global measures to assess the overall burden of multi-collinearity without limiting the observed data dimensions. We applied these measures to genomic applications to investigate patterns of multi-collinearity in genetic variations across individuals with diverse ancestral backgrounds. The measures were able to visually distinguish genomic regions of excessive multi-collinearity and contrast the level of multi-collinearity between different continental populations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源