论文标题

NaturalCC:归化源代码语料库的工具包

NaturalCC: A Toolkit to Naturalize the Source Code Corpus

论文作者

Wan, Yao, He, Yang, Zhang, Jian-Guo, Sui, Yulei, Jin, Hai, Xu, Guandong, Xiong, Caiming, Yu, Philip S.

论文摘要

我们提出了NaturalCC,这是一种有效且可扩展的工具包,可弥合自然语言和编程语言之间的差距,并促进大规模分析的研究。使用NaturalCC,来自自然语言或编程语言社区的研究人员可以快速,轻松地重现最先进的基线并实施其方法。 NaturalCC建立在FairSeq和Pytorch的基础上,(1)使用多GPU和混合精确的数据处理进行有效的计算,用于快速模型培训,(2)一个模块化且可扩展的框架,使其易于复制或实现大型代码分析的方法,以及(3)命令线界面和Grapenical用户界面以展示每个模型的表现。当前,我们已经在不同的任务(例如,代码完成,代码注释生成和代码检索)中包括了几个最先进的基线,以进行演示。该演示的视频可在https://www.youtube.com/watch?v=q4w5vsi-u3e&t = 25s上获得。

We present NaturalCC, an efficient and extensible toolkit to bridge the gap between natural language and programming language, and facilitate the research on big code analysis. Using NaturalCC, researchers both from natural language or programming language communities can quickly and easily reproduce the state-of-the-art baselines and implement their approach. NaturalCC is built upon Fairseq and PyTorch, providing (1) an efficient computation with multi-GPU and mixed-precision data processing for fast model training, (2) a modular and extensible framework that makes it easy to reproduce or implement an approach for big code analysis, and (3) a command line interface and a graphical user interface to demonstrate each model's performance. Currently, we have included several state-of-the-art baselines across different tasks (e.g., code completion, code comment generation, and code retrieval) for demonstration. The video of this demo is available at https://www.youtube.com/watch?v=q4W5VSI-u3E&t=25s.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源