论文标题

预防:一种无监督的方法来预测软件生产中的软件故障

PREVENT: An Unsupervised Approach to Predict Software Failures in Production

论文作者

Denaro, Giovanni, Heydarov, Rahim, Mohebbi, Ali, Pezzè, Mauro

论文摘要

本文提出了预防,这是一种通过组合无监督的技术来预测和本地化分布式企业应用程序中的故障的方法。软件故障在生产中可能会产生巨大的后果,因此预测和本地化失败是激活限制故障后果的愈合措施的重要步骤。在最新的状态下,可以从系统指标的异常组合中预测许多失败,这些局部指标与域专家提供的规则或受监督的学习模型有关。但是,这两种方法都将当前技术的有效性限制为可以通过预定义的规则捕获的良好理解的失败类型,或者在转移监督模型时观察到。防止将无监督方法的核心成分集成到一种新的方法中,以预测故障和本地化资源的本地化,而无需预定义的规则或使用观察到的故障进行培训。预防商业符合分布式云系统进行实验的结果表明,预防提供了更稳定和可靠的预测,而不是对监督的学习方法,而无需进行长时间且通常不切实际的培训。

This paper presents PREVENT, an approach for predicting and localizing failures in distributed enterprise applications by combining unsupervised techniques. Software failures can have dramatic consequences in production, and thus predicting and localizing failures is the essential step to activate healing measures that limit the disruptive consequences of failures. At the state of the art, many failures can be predicted from anomalous combinations of system metrics with respect to either rules provided from domain experts or supervised learning models. However, both these approaches limit the effectiveness of current techniques to well understood types of failures that can be either captured with predefined rules or observed while trining supervised models. PREVENT integrates the core ingredients of unsupervised approaches into a novel approach to predict failures and localize failing resources, without either requiring predefined rules or training with observed failures. The results of experimenting with PREVENT on a commercially-compliant distributed cloud system indicate that PREVENT provides more stable and reliable predictions, earlier than or comparably to supervised learning approaches, without requiring long and often impractical training with failures.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源