论文标题

高维混合线性回归的估计,置信区间和大规模假设测试

Estimation, Confidence Intervals, and Large-Scale Hypotheses Testing for High-Dimensional Mixed Linear Regression

论文作者

Zhang, Linjun, Ma, Rong, Cai, T. Tony, Li, Hongzhe

论文摘要

本文研究了高维混合线性回归(MLR),其中输出变量来自两个线性回归模型之一,其混合比例和随机协变量的未知协方差结构。在高维EM算法的基础上,我们提出了一种迭代程序来估算两个回归向量并确定其收敛速度。基于迭代估计器,我们进一步构建了依据估计量并建立了其渐近正态性。对于单个坐标,构建了以依据估计器为中心的置信区间。 此外,提出了一个大规模的多重测试程序,用于测试回归系数,并显示出徒劳地控制错误的发现率(FDR)。进行仿真研究以检查所提出方法的数值性能及其优于现有方法的优势。通过分析多重图像细胞仪的数据集进一步说明了所提出的方法,该数据集研究了细胞表型之间的相互作用网络,其中包括20个表达式或标记的组合的表达水平。

This paper studies the high-dimensional mixed linear regression (MLR) where the output variable comes from one of the two linear regression models with an unknown mixing proportion and an unknown covariance structure of the random covariates. Building upon a high-dimensional EM algorithm, we propose an iterative procedure for estimating the two regression vectors and establish their rates of convergence. Based on the iterative estimators, we further construct debiased estimators and establish their asymptotic normality. For individual coordinates, confidence intervals centered at the debiased estimators are constructed. Furthermore, a large-scale multiple testing procedure is proposed for testing the regression coefficients and is shown to control the false discovery rate (FDR) asymptotically. Simulation studies are carried out to examine the numerical performance of the proposed methods and their superiority over existing methods. The proposed methods are further illustrated through an analysis of a dataset of multiplex image cytometry, which investigates the interaction networks among the cellular phenotypes that include the expression levels of 20 epitopes or combinations of markers.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源