论文标题
免费测试午餐:开源的深入学习库
Free Lunch for Testing: Fuzzing Deep-Learning Libraries from Open Source
论文作者
论文摘要
深度学习(DL)系统可以使我们的生活更加轻松,因此从学术界和行业中获得了越来越多的关注。同时,DL系统中的错误可能是灾难性的,甚至可能威胁到关键安全应用中的人类生命。迄今为止,已经致力于测试DL模型的大量研究工作。但是,有趣的是,测试基础DL库的工作仍然有限,这是建造,优化和运行DL模型的基础。一个潜在的原因是,基础DL库的测试生成可能会很具有挑战性,因为它们的公共API主要在Python中曝光,因此由于动态键入而难以自动确定API输入参数类型。在本文中,我们提出了FreeFuzz,这是通过开放源开采来模糊DL库的第一种方法。更具体地说,FreeFuzz从三个不同的来源获取代码/模型:1)库中的代码段,2)图书馆开发人员测试和3)野外DL模型。然后,FreeFuzz会自动运行所有收集的代码/模型,并使用仪器来追踪每个涵盖的API的动态信息,包括调用过程中每个参数的类型和值,以及输入/输出张量的形状。最后,FreeFuzz将利用跟踪的动态信息来对每个覆盖的API执行模糊测试。对Pytorch和Tensorflow上的FreeFuzz的广泛研究(两个最受欢迎的DL库)表明,FreeFuzz能够自动跟踪有效的动态信息,以使1158个流行的API陷入困境,比最先进的Lemon高9倍,比柠檬低3.5倍。迄今为止,FreeFuzz已检测到Pytorch和Tensorflow的49个错误(开发人员已经确认了38个错误为以前未知)。
Deep learning (DL) systems can make our life much easier, and thus are gaining more and more attention from both academia and industry. Meanwhile, bugs in DL systems can be disastrous, and can even threaten human lives in safety-critical applications. To date, a huge body of research efforts have been dedicated to testing DL models. However, interestingly, there is still limited work for testing the underlying DL libraries, which are the foundation for building, optimizing, and running DL models. One potential reason is that test generation for the underlying DL libraries can be rather challenging since their public APIs are mainly exposed in Python, making it even hard to automatically determine the API input parameter types due to dynamic typing. In this paper, we propose FreeFuzz, the first approach to fuzzing DL libraries via mining from open source. More specifically, FreeFuzz obtains code/models from three different sources: 1) code snippets from the library documentation, 2) library developer tests, and 3) DL models in the wild. Then, FreeFuzz automatically runs all the collected code/models with instrumentation to trace the dynamic information for each covered API, including the types and values of each parameter during invocation, and shapes of input/output tensors. Lastly, FreeFuzz will leverage the traced dynamic information to perform fuzz testing for each covered API. The extensive study of FreeFuzz on PyTorch and TensorFlow, two of the most popular DL libraries, shows that FreeFuzz is able to automatically trace valid dynamic information for fuzzing 1158 popular APIs, 9X more than state-of-the-art LEMON with 3.5X lower overhead than LEMON. To date, FreeFuzz has detected 49 bugs for PyTorch and TensorFlow (with 38 already confirmed by developers as previously unknown).