深度学习基于日志的异常检测：我们有多远？

论文标题

深度学习基于日志的异常检测：我们有多远？

Log-based Anomaly Detection with Deep Learning: How Far Are We?

论文作者

Le, Van-Hoang, Zhang, Hongyu

论文摘要

软件密集型系统生产日志，以进行故障排除。最近，已经提出了许多深度学习模型，以根据日志数据自动检测系统异常。这些模型通常声称检测准确性很高。例如，大多数模型在常用的HDFS数据集上报告了大于0.9的F量。为了深刻了解我们离解决基于日志的异常检测问题的距离，在本文中，我们对五个基于最新的深度学习模型进行了深入分析，以检测四个公共日志数据集中的系统异常。我们的实验集中于模型评估的几个方面，包括培训数据选择，数据分组，类别分布，数据噪声和早期检测能力。我们的结果指出，所有这些方面都会对评估产生重大影响，并且所有研究的模型并不总是很好。基于对数的异常检测问题尚未解决。根据我们的发现，我们还建议可能的未来工作。

Software-intensive systems produce logs for troubleshooting purposes. Recently, many deep learning models have been proposed to automatically detect system anomalies based on log data. These models typically claim very high detection accuracy. For example, most models report an F-measure greater than 0.9 on the commonly-used HDFS dataset. To achieve a profound understanding of how far we are from solving the problem of log-based anomaly detection, in this paper, we conduct an in-depth analysis of five state-of-the-art deep learning-based models for detecting system anomalies on four public log datasets. Our experiments focus on several aspects of model evaluation, including training data selection, data grouping, class distribution, data noise, and early detection ability. Our results point out that all these aspects have significant impact on the evaluation, and that all the studied models do not always work well. The problem of log-based anomaly detection has not been solved yet. Based on our findings, we also suggest possible future work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题