论文标题
非SPARSE高维线性回归的规范阈值
Canonical thresholding for non-sparse high-dimensional linear regression
论文作者
论文摘要
我们考虑一个高维线性回归问题。与有关该主题的许多论文不同,我们不需要回归系数的稀疏性。相反,我们的主要结构假设是数据的协方差矩阵的特征值的衰减。我们提出了一个新的估计量家族,称为规范阈值估计器,该估计器以规范形式选择最大的回归系数。估算器承认一种明确的形式,可以与套索和主成分回归(PCR)链接。提供了固定设计和随机设计设置的理论分析。在平均误差和家庭中特定估计器的预测误差上获得了界限,可以清楚地陈述特征值衰减的足够条件,以确保收敛。此外,我们促进了使用相对错误的使用,与样本外$ r^2 $密切相关。对这些相对误差的研究导致了联合有效维度的新概念,该概念同时结合了数据的协方差和回归系数,并描述了线性回归问题的复杂性。建立了一些Minimax下限,以展示我们的过程的最佳性。与先前开发的方法相比,数值模拟证实了所提出的估计器的良好性能。
We consider a high-dimensional linear regression problem. Unlike many papers on the topic, we do not require sparsity of the regression coefficients; instead, our main structural assumption is a decay of eigenvalues of the covariance matrix of the data. We propose a new family of estimators, called the canonical thresholding estimators, which pick largest regression coefficients in the canonical form. The estimators admit an explicit form and can be linked to LASSO and Principal Component Regression (PCR). A theoretical analysis for both fixed design and random design settings is provided. Obtained bounds on the mean squared error and the prediction error of a specific estimator from the family allow to clearly state sufficient conditions on the decay of eigenvalues to ensure convergence. In addition, we promote the use of the relative errors, strongly linked with the out-of-sample $R^2$. The study of these relative errors leads to a new concept of joint effective dimension, which incorporates the covariance of the data and the regression coefficients simultaneously, and describes the complexity of a linear regression problem. Some minimax lower bounds are established to showcase the optimality of our procedure. Numerical simulations confirm good performance of the proposed estimators compared to the previously developed methods.