论文标题
通过信息检索增强机器学习,以推荐代码完成代码的真实克隆代码方法
Augmenting Machine Learning with Information Retrieval to Recommend Real Cloned Code Methods for Code Completion
论文作者
论文摘要
软件开发人员经常从存储库中重新使用源代码,因为它节省了开发时间和精力。因此,这些存储库中积累的代码克隆通常代表了重复的功能,并且是在探索或快速发展中重用的候选者。在以前的工作中,我们引入了DeepClone,这是一种深神网络模型,该模型在BigCloneBench数据集上通过微调GPT-2模型训练,以预测代码克隆方法。 DeepClone输出生成的概率性质可能导致语法和逻辑错误,该错误需要手动编辑输出以进行最终重复使用。在本文中,我们提出了一种新的方法,即在DeepClone输出之上应用信息检索(IR)技术,以建议与预测输出紧密匹配的真实克隆方法。我们已经定量评估了我们的策略,表明所提出的方法可显着提高推荐质量。
Software developers frequently reuse source code from repositories as it saves development time and effort. Code clones accumulated in these repositories hence represent often repeated functionalities and are candidates for reuse in an exploratory or rapid development. In previous work, we introduced DeepClone, a deep neural network model trained by fine tuning GPT-2 model over the BigCloneBench dataset to predict code clone methods. The probabilistic nature of DeepClone output generation can lead to syntax and logic errors that requires manual editing of the output for final reuse. In this paper, we propose a novel approach of applying an information retrieval (IR) technique on top of DeepClone output to recommend real clone methods closely matching the predicted output. We have quantitatively evaluated our strategy, showing that the proposed approach significantly improves the quality of recommendation.