论文标题
通过顺序回归模型分析聚类的连续响应变量
Analyzing Clustered Continuous Response Variables with Ordinal Regression Models
论文作者
论文摘要
连续响应变量通常需要转换以满足回归建模假设。但是,找到最佳转型是具有挑战性的,结果可能会随选择转换而有所不同。当对受试者或群集产生的连续响应重复测量连续响应变量时,建模由于群集内的相关性而引起的连续响应数据更具挑战性。我们扩展了一个广泛使用的序数回归模型,即累积概率模型(CPM),以基于基于通用估计方程(GEE)方法来拟合聚类的连续响应变量。通过我们的方法,可以获得边际参数,累积分布函数(CDF),期望和分位数的估计值,而无需预先转换可能偏斜的连续响应数据。计算挑战带有大量连续响应变量的不同值,我们提出了两种可行且具有计算有效的方法,以适合CPM,以用于群集连续响应变量具有不同的工作相关结构。我们通过模拟研究估计器的有限样品工作特性,并用两个数据示例说明了它们的实现。一项研究在HIV研究中的CD4:CD8比率的预测因子。另一个使用肺部健康研究的数据来研究单个核苷酸多态性对肺功能下降的贡献。
Continuous response variables often need to be transformed to meet regression modeling assumptions; however, finding the optimal transformation is challenging and results may vary with the choice of transformation. When a continuous response variable is measured repeatedly for a subject or the continuous responses arise from clusters, it is more challenging to model the continuous response data due to correlation within clusters. We extend a widely used ordinal regression model, the cumulative probability model (CPM), to fit clustered continuous response variables based on generalized estimating equation (GEE) methods for ordinal responses. With our approach, estimates of marginal parameters, cumulative distribution functions (CDFs), expectations, and quantiles conditional on covariates can be obtained without pre-transformation of the potentially skewed continuous response data. Computational challenges arise with large numbers of distinct values of the continuous response variable, and we propose two feasible and computationally efficient approaches to fit CPMs for clustered continuous response variables with different working correlation structures. We study finite sample operating characteristics of the estimators via simulation, and illustrate their implementation with two data examples. One studies predictors of CD4:CD8 ratios in an HIV study. The other uses data from The Lung Health Study to investigate the contribution of a single nucleotide polymorphism to lung function decline.