论文标题

高维鲁棒统计数据的流算法

Streaming Algorithms for High-Dimensional Robust Statistics

论文作者

Diakonikolas, Ilias, Kane, Daniel M., Pensia, Ankit, Pittas, Thanasis

论文摘要

我们研究流模型中的高维鲁棒统计任务。最近的工作线获得了一系列高维鲁棒估计任务的计算有效算法。不幸的是,所有以前的算法都需要存储整个数据集,从而在维度中至少产生二次记忆。在这项工作中,我们为高维鲁棒统计数据开发了第一个有效的流媒体算法,具有近乎最佳的内存要求(最多是对数因素)。我们的主要结果是在Huber污染模型(增强)高维鲁棒均值估计的任务中。我们为此任务提供了有效的单通路流算法,并在维度几乎可以保证了几乎最佳的误差和空间复杂性。作为推论,我们获得了一些更复杂的任务的流算法,包括近乎最佳的空间复杂性,包括强大的协方差估计,健壮的回归和更广泛的强大随机优化。

We study high-dimensional robust statistics tasks in the streaming model. A recent line of work obtained computationally efficient algorithms for a range of high-dimensional robust estimation tasks. Unfortunately, all previous algorithms require storing the entire dataset, incurring memory at least quadratic in the dimension. In this work, we develop the first efficient streaming algorithms for high-dimensional robust statistics with near-optimal memory requirements (up to logarithmic factors). Our main result is for the task of high-dimensional robust mean estimation in (a strengthening of) Huber's contamination model. We give an efficient single-pass streaming algorithm for this task with near-optimal error guarantees and space complexity nearly-linear in the dimension. As a corollary, we obtain streaming algorithms with near-optimal space complexity for several more complex tasks, including robust covariance estimation, robust regression, and more generally robust stochastic optimization.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源