论文标题
经典模糊测试的相关性:我们解决了这个问题吗?
The Relevance of Classic Fuzz Testing: Have We Solved This One?
论文作者
论文摘要
随着绒毛测试的成立30周年,面对模糊测试技术和工具的不可思议的进展,如果经典的基本绒毛技术仍然有用且适用,就会出现问题吗?在这种传统上,我们已更新了基本的模糊工具和测试脚本,并将其应用于Linux,FreeBSD和MacOS上的大量UNIX实用程序。和以前一样,我们的故障标准是该计划是崩溃还是悬挂。我们发现,Linux上74个实用程序中有9次崩溃或挂钩,在FreeBSD上的78个公用事业中,有15个崩溃或MacOS上76个公用事业中的12个。在这三个平台上,总共24个不同的公用事业失败了。我们注意到,这些故障率略高于我们对命令线实用程序可靠性的研究。在基本的模糊传统中,我们调试了每个失败的实用程序,并将失败的原因分类。当前结果仍然广泛存在经典的故障类别,例如指针和数组错误以及未检查返回代码。此外,我们发现出现了一些新的失败类别。我们介绍了这些失败的例子,以说明使它们发生的编程实践。附带说明,我们测试了现代编程语言(Rust)可用的实用程序数量有限,并发现它们的可靠性不如标准的实用程序。
As fuzz testing has passed its 30th anniversary, and in the face of the incredible progress in fuzz testing techniques and tools, the question arises if the classic, basic fuzz technique is still useful and applicable? In that tradition, we have updated the basic fuzz tools and testing scripts and applied them to a large collection of Unix utilities on Linux, FreeBSD, and MacOS. As before, our failure criteria was whether the program crashed or hung. We found that 9 crash or hang out of 74 utilities on Linux, 15 out of 78 utilities on FreeBSD, and 12 out of 76 utilities on MacOS. A total of 24 different utilities failed across the three platforms. We note that these failure rates are somewhat higher than our in previous 1995, 2000, and 2006 studies of the reliability of command line utilities. In the basic fuzz tradition, we debugged each failed utility and categorized the causes the failures. Classic categories of failures, such as pointer and array errors and not checking return codes, were still broadly present in the current results. In addition, we found a couple of new categories of failures appearing. We present examples of these failures to illustrate the programming practices that allowed them to happen. As a side note, we tested the limited number of utilities available in a modern programming language (Rust) and found them to be of no better reliability than the standard ones.