论文标题
异常刺激稀疏/低排时最小二乘回归和稳健的矩阵完成
Outlier-robust sparse/low-rank least-squares regression and robust matrix completion
论文作者
论文摘要
我们研究了具有异质噪声的下高斯统计学习框架内的高维最小二乘回归。它包括$ s $ -sparse和$ r $ - 低rank最小二乘回归时,当标签的分数$ε$受到对抗污染时。我们还基于产品过程的新应用,提出了一种新颖的痕量回归理论。对于这些问题,我们显示了$ r(n,d_ {e})+\ sqrt {\ log(1/δ)/n}+ε\ log(1/ε)$的新颖的近乎最佳的“ Subgaussian”估计率。在这里,$ r(n,d_ {e})$是有效尺寸$ d_ {e} $的最佳未污染速率,但独立于故障概率$δ$。这些速率在$δ$上均匀有效,即估计器的调整不取决于$δ$。最后,我们考虑使用不均匀采样的嘈杂稳健矩阵完成。如果只有低级别矩阵就令人感兴趣,我们会提出一个新颖的近乎最佳的利率,该速度独立于腐败水平$ a $。我们的估计器是可容纳的,并且基于新的“分类” Huber型损失。不需要有关$(S,R,ε,A)$的信息来调整这些估计器。我们的分析利用了新颖的$δ$ - 最佳浓度不平等,用于乘数和产品过程,这些过程可能在其他地方有用。例如,它们暗示了套索的新颖尖锐的甲骨文不平等,并且对$δ$的最佳依赖性。数值模拟证实了我们的理论预测。特别是,“排序”的Huber回归可以优于经典的Huber回归。
We study high-dimensional least-squares regression within a subgaussian statistical learning framework with heterogeneous noise. It includes $s$-sparse and $r$-low-rank least-squares regression when a fraction $ε$ of the labels are adversarially contaminated. We also present a novel theory of trace-regression with matrix decomposition based on a new application of the product process. For these problems, we show novel near-optimal "subgaussian" estimation rates of the form $r(n,d_{e})+\sqrt{\log(1/δ)/n}+ε\log(1/ε)$, valid with probability at least $1-δ$. Here, $r(n,d_{e})$ is the optimal uncontaminated rate as a function of the effective dimension $d_{e}$ but independent of the failure probability $δ$. These rates are valid uniformly on $δ$, i.e., the estimators' tuning do not depend on $δ$. Lastly, we consider noisy robust matrix completion with non-uniform sampling. If only the low-rank matrix is of interest, we present a novel near-optimal rate that is independent of the corruption level $a$. Our estimators are tractable and based on a new "sorted" Huber-type loss. No information on $(s,r,ε,a)$ are needed to tune these estimators. Our analysis makes use of novel $δ$-optimal concentration inequalities for the multiplier and product processes which could be useful elsewhere. For instance, they imply novel sharp oracle inequalities for Lasso and Slope with optimal dependence on $δ$. Numerical simulations confirm our theoretical predictions. In particular, "sorted" Huber regression can outperform classical Huber regression.