COCOMIC：通过共同建模文件内和跨文件的上下文来完成代码

论文标题

COCOMIC：通过共同建模文件内和跨文件的上下文来完成代码

CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context

论文作者

Ding, Yangruibo, Wang, Zijian, Ahmad, Wasi Uddin, Ramanathan, Murali Krishna, Nallapati, Ramesh, Bhatia, Parminder, Roth, Dan, Xiang, Bing

论文摘要

虽然预先训练的语言模型（LM）在代码完成方面取得了巨大成功，但它们仅根据文件中的内容（即文件中的上下文）生成代码，但忽略了同一项目中其他文件中的丰富语义，即交叉文件上下文，即对现代模块化软件开发特别有用的关键信息来源。这种忽略的限制了代码模型在代码完成中的能力，从而导致意外行为，例如生成幻觉的类成员功能或具有意外参数的函数调用。在这项工作中，我们开发了一个跨文件上下文查找器工具CCFinder，该工具有效地定位并检索了最相关的跨文件上下文。我们提出了Cocomic，该框架结合了跨文件的上下文，以在预审代码LMS的基础上共同学习内存和跨文件上下文。 Cocomic成功地改善了现有代码LM，确切匹配的相对相对增加了33.94％，当提供跨文件上下文时，标识符匹配的标识符匹配相对增加了28.69％。

While pre-trained language models (LM) for code have achieved great success in code completion, they generate code conditioned only on the contents within the file, i.e., in-file context, but ignore the rich semantics in other files within the same project, i.e., cross-file context, a critical source of information that is especially useful in modern modular software development. Such overlooking constrains code language models' capacity in code completion, leading to unexpected behaviors such as generating hallucinated class member functions or function calls with unexpected arguments. In this work, we develop a cross-file context finder tool, CCFINDER, that effectively locates and retrieves the most relevant cross-file context. We propose CoCoMIC, a framework that incorporates cross-file context to learn the in-file and cross-file context jointly on top of pretrained code LMs. CoCoMIC successfully improves the existing code LM with a 33.94% relative increase in exact match and a 28.69% relative increase in identifier matching for code completion when the cross-file context is provided.

下载PDF全文

下载文献需遵守相关版权规定

论文标题