CodeGen测试：自动代码生成模型集成程序测试信息

论文标题

CodeGen测试：自动代码生成模型集成程序测试信息

CodeGen-Test: An Automatic Code Generation Model Integrating Program Test Information

论文作者

Zhong, Maosheng, Liu, Gen, Li, Hongwei, Kuang, Jiangling, Zeng, Jinshan, Wang, Mingwen

论文摘要

自动代码生成是根据给定的自然语言描述生成程序代码。当前的主流方法使用神经网络来编码自然语言描述，并在解码器上输出抽象语法树（AST），然后将AST转换为程序代码。虽然生成的代码在很大程度上符合特定的语法规则，但仍然忽略了两个问题。一个人缺少程序测试，这是完整代码实现过程中的重要步骤；另一个仅关注生成代码的语法符合性，同时忽略了更重要的程序功能要求。本文提出了一个Codegen测试模型，该模型添加了程序测试步骤，并将程序测试信息合并到迭代生成满足程序功能要求的代码，从而提高了代码生成的质量。同时，本文提出了一个新的评估度量，测试准确性（TEST-ACC），该度量表示生成的代码中通过程序测试的比例。与以前的评估指标不同，该指标仅从字符相似性的角度评估代码生成的质量，测试-ACC可以评估程序功能中代码生成的质量。此外，本文在Python数据集“ Hearthstone Legend”上评估了CodeGen-Test模型。实验结果表明，所提出的方法可以有效地提高生成的代码的质量。与现有的最佳模型相比，CodeGen-Test模型将BLEU值提高了0.2％，Rouge-L值升高0.3％，测试-ACC提高了6％。

Automatic code generation is to generate the program code according to the given natural language description. The current mainstream approach uses neural networks to encode natural language descriptions, and output abstract syntax trees (AST) at the decoder, then convert the AST into program code. While the generated code largely conforms to specific syntax rules, two problems are still ignored. One is missing program testing, an essential step in the process of complete code implementation; the other is only focusing on the syntax compliance of the generated code, while ignoring the more important program functional requirements. The paper proposes a CodeGen-Test model, which adds program testing steps and incorporates program testing information to iteratively generate code that meets the functional requirements of the program, thereby improving the quality of code generation. At the same time, the paper proposes a new evaluation metric, test accuracy (Test-Acc), which represents the proportion of passing program test in generated code. Different from the previous evaluation metric, which only evaluates the quality of code generation from the perspective of character similarity, the Test-Acc can evaluate the quality of code generation from the Program functions. Moreover, the paper evaluates the CodeGen-test model on a python data set "hearthstone legend". The experimental results show the proposed method can effectively improve the quality of generated code. Compared with the existing optimal model, CodeGen-Test model improves the Bleu value by 0.2%, Rouge-L value by 0.3% and Test-Acc by 6%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题