论文标题
线性混合模型中解释变化的分解
Decomposition of Explained Variation in the Linear Mixed Model
论文作者
论文摘要
在线性混合模型(LMM)中,与固定和随机效应相关的解释变量的分散相关性的同时评估和比较仍然是一个重要的开放实际问题。基于LMM的方差成分形式的受限最大似然方程,我们证明了因变量的正方形总和的适当分解为解释性变化的可解释估计值的无偏估计量。该结果导致众所周知的调整后的确定系数自然扩展到LMM。此外,我们分配了新型的无偏估计量解释的变化,以与单个模型拟合中固定和随机效应相关的协变量的特定贡献。这些参数的解释变化构成了易于解释的数量,评估了与固定和随机效应相关的协变量的分散性相关性,从而允许协变量排名。 为了说明,我们将纵向睡眠剥夺研究中的受试者和时间解释的变化对比。通过比较人口特征和空间水平的分散性相关性,我们确定识字率是布基纳法索收入不平等的主要驱动力。最后,我们开发了一种新颖的相关图,以可视化拟南芥中高维基因组标记的分散相关性。
In the linear mixed model (LMM), the simultaneous assessment and comparison of dispersion relevance of explanatory variables associated with fixed and random effects remains an important open practical problem. Based on the restricted maximum likelihood equations in the variance components form of the LMM, we prove a proper decomposition of the sum of squares of the dependent variable into unbiased estimators of interpretable estimands of explained variation. This result leads to a natural extension of the well-known adjusted coefficient of determination to the LMM. Further, we allocate the novel unbiased estimators of explained variation to specific contributions of covariates associated with fixed and random effects within a single model fit. These parameter-wise explained variations constitute easily interpretable quantities, assessing dispersion relevance of covariates associated with both fixed and random effects on a common scale, thus allowing for a covariate ranking. For illustration, we contrast the variation explained by subjects and time in the longitudinal sleep deprivation study. By comparing the dispersion relevance of population characteristics and spatial levels, we determine literacy as a major driver of income inequality in Burkina Faso. Finally, we develop a novel relevance plot to visualize the dispersion relevance of high-dimensional genomic markers in Arabidopsis thaliana.