论文标题

COSEA:卷积代码搜索以层次关注

COSEA: Convolutional Code Search with Layer-wise Attention

论文作者

Wang, Hao, Zhang, Jia, Xia, Yingce, Bian, Jiang, Zhang, Chao, Liu, Tie-Yan

论文摘要

旨在检索与给定自然语言查询相关的代码段的语义代码搜索吸引了许多研究工作,目的是加速软件开发。大量的在线公开代码存储库促使使用深度学习技术来构建最新的代码搜索模型。特别是,它们利用深层神经网络将代码和查询嵌入到统一的语义向量空间中,然后使用代码和查询矢量之间的相似性来近似代码和查询之间的语义相关性。但是,大多数现有研究忽略了该法规的内在结构逻辑,这些结构逻辑确实包含了大量的语义信息,并且无法捕获代码的内在特征。在本文中,我们提出了一种新的深度学习体系结构COSEA,该体系结构以层次的关注来利用卷积神经网络,以捕获宝贵的代码的内在结构逻辑。为了进一步提高COSEA的学习效率,我们提出了训练代码搜索模型的对比损失的一种变体,在该模型中,应将基本代码与最相似的负面样本区分开。我们已经实施了COSEA的原型。对Python和SQL现有的公共数据集进行了广泛的实验,表明COSEA可以对代码搜索任务的最新方法实现重大改进。

Semantic code search, which aims to retrieve code snippets relevant to a given natural language query, has attracted many research efforts with the purpose of accelerating software development. The huge amount of online publicly available code repositories has prompted the employment of deep learning techniques to build state-of-the-art code search models. Particularly, they leverage deep neural networks to embed codes and queries into a unified semantic vector space and then use the similarity between code's and query's vectors to approximate the semantic correlation between code and the query. However, most existing studies overlook the code's intrinsic structural logic, which indeed contains a wealth of semantic information, and fails to capture intrinsic features of codes. In this paper, we propose a new deep learning architecture, COSEA, which leverages convolutional neural networks with layer-wise attention to capture the valuable code's intrinsic structural logic. To further increase the learning efficiency of COSEA, we propose a variant of contrastive loss for training the code search model, where the ground-truth code should be distinguished from the most similar negative sample. We have implemented a prototype of COSEA. Extensive experiments over existing public datasets of Python and SQL have demonstrated that COSEA can achieve significant improvements over state-of-the-art methods on code search tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源