$σ$ -ridge：通过经验贝叶斯噪声水平交叉验证的集体正规脊回归

论文标题

$σ$ -ridge：通过经验贝叶斯噪声水平交叉验证的集体正规脊回归

$σ$-Ridge: group regularized ridge regression via empirical Bayes noise level cross-validation

论文作者

Ignatiadis, Nikolaos, Lolas, Panagiotis

论文摘要

预测模型中的功能无法交换，但常见的监督模型将其视为这样。在这里，我们研究山脊回归，分析师可以根据外部侧面信息将功能分为$ K $组。例如，在高通量生物学中，特征可以代表基因表达，蛋白质丰度或临床数据，因此每个特征组都代表一种不同的方式。分析师的目标是选择最佳正则化参数$λ=（λ_1，\ dotsc，λ_k）$ - 每组一个。在这项工作中，我们通过在高维随机效应模型下以$ p \ asymp n $为$ n \ to \ infty $的高维随机效应模型来得出限制风险公式，从而研究$λ$对群体调查脊回归的预测风险的影响。此外，我们提出了一种数据驱动的方法，用于选择达到最佳渐近风险的$λ$：关键思想是解释残差噪声差异$σ^2 $，作为通过交叉验证选择的正规化参数。经验贝叶斯的构造将一维参数$σ$映射到正规化参数的$ k $维矢量，即$σ\ mapsto \widehatλ（σ）$。除了其理论最优性之外，提出的方法是实用的，并且运行速度与没有特征组的交叉验证山脊回归（$ k = 1 $）。

Features in predictive models are not exchangeable, yet common supervised models treat them as such. Here we study ridge regression when the analyst can partition the features into $K$ groups based on external side-information. For example, in high-throughput biology, features may represent gene expression, protein abundance or clinical data and so each feature group represents a distinct modality. The analyst's goal is to choose optimal regularization parameters $λ= (λ_1, \dotsc, λ_K)$ -- one for each group. In this work, we study the impact of $λ$ on the predictive risk of group-regularized ridge regression by deriving limiting risk formulae under a high-dimensional random effects model with $p\asymp n$ as $n \to \infty$. Furthermore, we propose a data-driven method for choosing $λ$ that attains the optimal asymptotic risk: The key idea is to interpret the residual noise variance $σ^2$, as a regularization parameter to be chosen through cross-validation. An empirical Bayes construction maps the one-dimensional parameter $σ$ to the $K$-dimensional vector of regularization parameters, i.e., $σ\mapsto \widehatλ(σ)$. Beyond its theoretical optimality, the proposed method is practical and runs as fast as cross-validated ridge regression without feature groups ($K=1$).

下载PDF全文

下载文献需遵守相关版权规定

论文标题