论文标题

亲爱的:一种基于深度学习的新型自动化程序维修方法

DEAR: A Novel Deep Learning-based Approach for Automated Program Repair

论文作者

Li, Yi, Wang, Shaohua, Nguyen, Tien N.

论文摘要

现有的基于深度学习(DL)的自动化程序维修(APR)模型在固定一般软件缺陷方面受到限制。 % 我们提出{\ tool},这是一种基于DL的方法,该方法支持修复一般的错误,该错误需要一个或多个代码中的一个或多个连续语句一次或多个连续的语句。 % 我们首先设计了一种新型的故障定位(FL)技术,用于将传统基于频谱(SB)FL与深度学习和数据流分析相结合的多塑件修复。它采用了SBFL型号返回的越野车陈述,检测到要立即修复的越野车大块,并在大块中扩展了一个越野车的声明$ s $,以在$ s $左右的其他可疑陈述中包括其他可疑陈述。我们设计了一个两层基于树的LSTM模型,该模型结合了周期训练,并使用划分和争夺策略来学习适当的代码转换,以在包含周围子树组成的合适固定环境中固定多个语句。我们进行了几个实验,以评估三个数据集上的{\ tool}:缺陷4J(395个错误),bigfix(+26k错误)和cpatminer(+44k错误)。在缺陷4J数据集上,{\ tool}就仅具有TOP-1补丁的自动固定错误的数量而言,从42 \%-683 \%优于基准。在BigFix数据集上,它比具有TOP-1补丁的现有基于DL的APR模型要修复31--145个错误。在CPATMINER数据集上,在667个固定错误中,有169(25.3 \%)多嵌入/多statement错误。 {\ tool}比基于最新的基于DL的APR模型,修复了71和164个错误,包括52和61个错误的错误/多态错误。

The existing deep learning (DL)-based automated program repair (APR) models are limited in fixing general software defects. % We present {\tool}, a DL-based approach that supports fixing for the general bugs that require dependent changes at once to one or multiple consecutive statements in one or multiple hunks of code. % We first design a novel fault localization (FL) technique for multi-hunk, multi-statement fixes that combines traditional spectrum-based (SB) FL with deep learning and data-flow analysis. It takes the buggy statements returned by the SBFL model, detects the buggy hunks to be fixed at once, and expands a buggy statement $s$ in a hunk to include other suspicious statements around $s$. We design a two-tier, tree-based LSTM model that incorporates cycle training and uses a divide-and-conquer strategy to learn proper code transformations for fixing multiple statements in the suitable fixing context consisting of surrounding subtrees. We conducted several experiments to evaluate {\tool} on three datasets: Defects4J (395 bugs), BigFix (+26k bugs), and CPatMiner (+44k bugs). On Defects4J dataset, {\tool} outperforms the baselines from 42\%--683\% in terms of the number of auto-fixed bugs with only the top-1 patches. On BigFix dataset, it fixes 31--145 more bugs than existing DL-based APR models with the top-1 patches. On CPatMiner dataset, among 667 fixed bugs, there are 169 (25.3\%) multi-hunk/multi-statement bugs. {\tool} fixes 71 and 164 more bugs, including 52 and 61 more multi-hunk/multi-statement bugs, than the state-of-the-art, DL-based APR models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源