论文标题
朝着有限范围的自动发展开发:一个神经符号框架从Expressibles到可执行文件
Towards Top-Down Automated Development in Limited Scopes: A Neuro-Symbolic Framework from Expressibles to Executables
论文作者
论文摘要
深度代码生成是软件工程深度学习(DL4SE)的主题,该主题采用神经模型来生成预期功能的代码。由于端到端神经方法缺乏域知识和软件层次结构意识,因此它们倾向于执行不良的项目级任务。为了系统地探索代码生成的潜在改进,我们让IT参与从\ emph {expressibles}到\ emph {operutables}的整个自上而下的开发,这在有限的范围中可能是可能的。在此过程中,它受益于大量样本,功能和知识。作为基金会,我们建议对代码数据(即代码分类法)建立分类法,利用代码信息的分类。此外,我们引入了三层语义金字塔(SP)以关联文本数据和代码数据。它标识了不同的抽象水平的信息,因此介绍了有关开发的领域知识,并揭示了软件的层次结构。此外,我们建议使用语义金字塔框架(SPF)作为方法,重点是高模块化和低复杂性的软件。 SPF将代码生成过程分为阶段,并为潜在的相互作用提供储量。此外,我们在软件开发中构思了初步应用,以确认神经符号框架。
Deep code generation is a topic of deep learning for software engineering (DL4SE), which adopts neural models to generate code for the intended functions. Since end-to-end neural methods lack domain knowledge and software hierarchy awareness, they tend to perform poorly w.r.t project-level tasks. To systematically explore the potential improvements of code generation, we let it participate in the whole top-down development from \emph{expressibles} to \emph{executables}, which is possible in limited scopes. In the process, it benefits from massive samples, features, and knowledge. As the foundation, we suggest building a taxonomy on code data, namely code taxonomy, leveraging the categorization of code information. Moreover, we introduce a three-layer semantic pyramid (SP) to associate text data and code data. It identifies the information of different abstraction levels, and thus introduces the domain knowledge on development and reveals the hierarchy of software. Furthermore, we propose a semantic pyramid framework (SPF) as the approach, focusing on software of high modularity and low complexity. SPF divides the code generation process into stages and reserves spots for potential interactions. In addition, we conceived preliminary applications in software development to confirm the neuro-symbolic framework.