论文标题
Lipschitz损失的高维测量误差模型
High-dimensional Measurement Error Models for Lipschitz Loss
论文作者
论文摘要
最近,新兴的大规模生物医学数据为科学发现带来了激动人心的机会。但是,数据中的超高维度和不可忽略的测量错误可能会在估计中造成困难。具有测量误差的高维协变量的方法有限,通常需要了解噪声分布并关注线性或广义线性模型。在这项工作中,我们为一类Lipschitz损失函数开发了高维测量误差模型,其中包括逻辑回归,铰链损耗和分位回归等。我们的估计器旨在最大程度地减少所有属于合适可行集合的估计器中的$ L_1 $规范,而无需了解噪声分布。随后,我们将这些估计器推广到一个可以在计算上可扩展到更高维度的LASSO模拟版本。我们从有限样本统计误差界限和签名一致性方面得出理论保证,即使维度随样本量呈指数增长。广泛的模拟研究表明,与分类和分数回归问题的现有方法相比,性能优越。基于人类Connectome项目数据中大脑功能连接性的性别分类任务的应用程序说明了我们方法下的准确性提高,并且能够可靠地识别出驱动性别差异的重要大脑连接的能力。
Recently emerging large-scale biomedical data pose exciting opportunities for scientific discoveries. However, the ultrahigh dimensionality and non-negligible measurement errors in the data may create difficulties in estimation. There are limited methods for high-dimensional covariates with measurement error, that usually require knowledge of the noise distribution and focus on linear or generalized linear models. In this work, we develop high-dimensional measurement error models for a class of Lipschitz loss functions that encompasses logistic regression, hinge loss and quantile regression, among others. Our estimator is designed to minimize the $L_1$ norm among all estimators belonging to suitable feasible sets, without requiring any knowledge of the noise distribution. Subsequently, we generalize these estimators to a Lasso analog version that is computationally scalable to higher dimensions. We derive theoretical guarantees in terms of finite sample statistical error bounds and sign consistency, even when the dimensionality increases exponentially with the sample size. Extensive simulation studies demonstrate superior performance compared to existing methods in classification and quantile regression problems. An application to a gender classification task based on brain functional connectivity in the Human Connectome Project data illustrates improved accuracy under our approach, and the ability to reliably identify significant brain connections that drive gender differences.