使用突变分析对模糊器进行系统评估

论文标题

使用突变分析对模糊器进行系统评估

Systematic Assessment of Fuzzers using Mutation Analysis

论文作者

Görz, Philipp, Mathis, Björn, Hassler, Keno, Güler, Emre, Holz, Thorsten, Zeller, Andreas, Gopinath, Rahul

论文摘要

模糊是发现程序中漏洞的重要方法。尽管过去几年在这一领域取得了长足的进步，但测量和比较模糊器的有效性仍然是一个开放的研究问题。在软件测试中，用于评估测试质量的黄金标准是突变分析，它评估了测试检测合成错误的能力：如果一组测试未能检测到此类突变，则预计它也无法检测到实际错误。突变分析涵盖了各种覆盖范围，并提供了一组大量的故障，这些断层可能很难触发和检测，从而防止了饱和和过度拟合的问题。不幸的是，由于突变需要独立评估，传统突变分析的成本对于模糊而过高。在本文中，我们采用现代突变分析技术来汇集多个突变，并允许我们首次评估和比较模糊与突变分析。我们介绍了一个模糊台评估台，并将其应用于许多流行的杂物和受试者。在全面的评估中，我们展示了如何使用它来评估模糊性能并衡量改进技术的影响。所需的CPU时间仍然可以控制：需要4.09 CPU年来分析七个受试者的杂物和141,278个突变。我们发现，今天的模糊器只能检测到一小部分突变，这应该被视为未来研究的挑战 - 尤其是在改善（1）检测一般崩溃之外的故障（2）触发突变（因此）。

Fuzzing is an important method to discover vulnerabilities in programs. Despite considerable progress in this area in the past years, measuring and comparing the effectiveness of fuzzers is still an open research question. In software testing, the gold standard for evaluating test quality is mutation analysis, which evaluates a test's ability to detect synthetic bugs: If a set of tests fails to detect such mutations, it is expected to also fail to detect real bugs. Mutation analysis subsumes various coverage measures and provides a large and diverse set of faults that can be arbitrarily hard to trigger and detect, thus preventing the problems of saturation and overfitting. Unfortunately, the cost of traditional mutation analysis is exorbitant for fuzzing, as mutations need independent evaluation. In this paper, we apply modern mutation analysis techniques that pool multiple mutations and allow us -- for the first time -- to evaluate and compare fuzzers with mutation analysis. We introduce an evaluation bench for fuzzers and apply it to a number of popular fuzzers and subjects. In a comprehensive evaluation, we show how we can use it to assess fuzzer performance and measure the impact of improved techniques. The required CPU time remains manageable: 4.09 CPU years are needed to analyze a fuzzer on seven subjects and a total of 141,278 mutations. We find that today's fuzzers can detect only a small percentage of mutations, which should be seen as a challenge for future research -- notably in improving (1) detecting failures beyond generic crashes (2) triggering mutations (and thus faults).

下载PDF全文

下载文献需遵守相关版权规定

论文标题