论文标题

基于小波的混合机器学习模型,用于分发互联网流量预测

Wavelet-Based Hybrid Machine Learning Model for Out-of-distribution Internet Traffic Prediction

论文作者

Saha, Sajal, Haque, Anwar, Sidebottom, Greg

论文摘要

有效的互联网流量预测对于确保积极管理计算机网络至关重要。如今,机器学习方法在建模现实世界中的复杂流量中表现出了有希望的性能。但是,大多数现有作品都认为模型培训和评估数据来自相同的分布。但是实际上,该模型可能会在部署阶段处理略微或完全未知的分布的数据。本文使用极端的梯度提升,轻梯度提升机,随机梯度下降,梯度提升回归器,catboost回归器以及其堆叠的集合模型,使用来自相同和分布的数据堆叠的集合模型,研究并评估了机器学习性能。此外,我们提出了一个混合机器学习模型,该模型集成了小波分解,以改善分布预测,因为独立模型无法很好地概括。我们的实验结果表明,独立集合模型的最佳性能为96.4%,而混合集合模型则提高了1%的分发数据。但是,与培训集的三个不同数据集进行测试时,其性能大大下降。但是,与独立模型相比,我们提出的混合模型大大降低了相同和分布式评估之间的性能差距,这表明在分发概括的情况下,分解技术的有效性。

Efficient prediction of internet traffic is essential for ensuring proactive management of computer networks. Nowadays, machine learning approaches show promising performance in modeling real-world complex traffic. However, most existing works assumed that model training and evaluation data came from identical distribution. But in practice, there is a high probability that the model will deal with data from a slightly or entirely unknown distribution in the deployment phase. This paper investigated and evaluated machine learning performances using eXtreme Gradient Boosting, Light Gradient Boosting Machine, Stochastic Gradient Descent, Gradient Boosting Regressor, CatBoost Regressor, and their stacked ensemble model using data from both identical and out-of distribution. Also, we proposed a hybrid machine learning model integrating wavelet decomposition for improving out-of-distribution prediction as standalone models were unable to generalize very well. Our experimental results show the best performance of the standalone ensemble model with an accuracy of 96.4%, while the hybrid ensemble model improved it by 1% for in-distribution data. But its performance dropped significantly when tested with three different datasets having a distribution shift than the training set. However, our proposed hybrid model considerably reduces the performance gap between identical and out-of-distribution evaluation compared with the standalone model, indicating the decomposition technique's effectiveness in the case of out-of-distribution generalization.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源