论文标题
通过时间序列离群值检测来改善太阳耀斑预测
Improving Solar Flare Prediction by Time Series Outlier Detection
论文作者
论文摘要
太阳耀斑不仅对外层空间的技术和宇航员的健康构成风险,而且在我们的Hight技术,相互联系的基础设施中,我们的生活极大地依赖于地球上的破坏。尽管已经提出了许多机器学习方法来改善耀斑预测,但据我们所知,它们都没有研究过异常值对可靠性和这些模型的性能的影响。在这项研究中,我们研究了异常值在多元时间序列基准数据集中的影响,即天鹅 - SF对耀斑预测模型,并检验我们的假设。也就是说,天鹅 - SF中存在异常值,删除可以增强看不见的数据集上预测模型的性能。我们采用隔离林来检测弱耀斑实例之间的异常值。使用大量污染率进行了几项实验,这些实验确定了当前异常值的百分比。我们使用LimeseriessVC来评估每个数据集的实际污染质量。在我们最好的发现中,我们的真实技能统计数据提高了279%,海德克技能得分提高了68%。结果表明,如果检测到并正确删除异常值,总体上可以取得重大改进,以提高预测。
Solar flares not only pose risks to outer space technologies and astronauts' well being, but also cause disruptions on earth to our hight-tech, interconnected infrastructure our lives highly depend on. While a number of machine-learning methods have been proposed to improve flare prediction, none of them, to the best of our knowledge, have investigated the impact of outliers on the reliability and those models' performance. In this study, we investigate the impact of outliers in a multivariate time series benchmark dataset, namely SWAN-SF, on flare prediction models, and test our hypothesis. That is, there exist outliers in SWAN-SF, removal of which enhances the performance of the prediction models on unseen datasets. We employ Isolation Forest to detect the outliers among the weaker flare instances. Several experiments are carried out using a large range of contamination rates which determine the percentage of present outliers. We asses the quality of each dataset in terms of its actual contamination using TimeSeriesSVC. In our best finding, we achieve a 279% increase in True Skill Statistic and 68% increase in Heidke Skill Score. The results show that overall a significant improvement can be achieved to flare prediction if outliers are detected and removed properly.