论文标题

在线更新Huber的大数据流的强大回归

Online Updating Huber Robust Regression for Big Data Streams

论文作者

Tao, Chunbai, Wang, Shanshan

论文摘要

大数据流正在随着现代科学和信息技术的发展而越来越多的关注。由于有限的计算机记忆与大量流数据的不兼容,因此值得研究没有历史数据存储的实时方法。此外,高速数据流生成的高速度数据流可能会发生异常值,要求进行更强大的分析。在这些问题的推动下,本文提出了一种新颖的在线更新Huber Robust Regression算法。通过提取新数据子集的关键功能,它可以获得无需历史数据存储的计算有效的在线更新估计器。同时,通过将HUBER回归整合到框架中,估算器对受污染的数据流(例如重尾或异质分布式)以及带异常值的情况有稳定性。此外,提出的在线更新估计器在渐近上等同于整个数据获得的Oracle估计器,并且计算复杂性较低。还进行了广泛的数值模拟和实际数据分析,以评估所提出方法的估计和计算效率。

Big data streams are grasping increasing attention with the development of modern science and information technology. Due to the incompatibility of limited computer memory to high volume of streaming data, real-time methods without historical data storage is worth investigating. Moreover, outliers may occur with high velocity data streams generating, calling for more robust analysis. Motivated by these concerns, a novel Online Updating Huber Robust Regression algorithm is proposed in this paper. By extracting key features of new data subsets, it obtains a computational efficient online updating estimator without historical data storage. Meanwhile, by integrating Huber regression into the framework, the estimator is robust to contaminated data streams, such as heavy-tailed or heterogeneous distributed ones as well as cases with outliers. Moreover, the proposed online updating estimator is asymptotically equivalent to Oracle estimator obtained by the entire data and has a lower computation complexity. Extensive numerical simulations and a real data analysis are also conducted to evaluate the estimation and calculation efficiency of the proposed method.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源