论文标题
通过向变压器喂养树来代码预测
Code Prediction by Feeding Trees to Transformers
论文作者
论文摘要
我们在自动完整系统中使用的代码预测准确性(次要的标记预测)的准确性(次要的标记预测)中提高了最新。首先,我们报告说,使用最近提出的变压器体系结构,即使在开箱即用的范围内均优于先前的神经和非神经系统进行代码预测。然后,我们证明,通过使变压器体系结构意识到代码的句法结构,我们进一步增加了基于变压器的系统优于先前系统的余量。这样,它比基于RNN的系统(类似于Hellendoorn等人,2018年)的准确性增长了18.3%,Deep3 System(Raychev等人,2016年)的准确性在14.1%,Code2Seq(Alon等人,2018年)的改编代码预测的适应率为14.4%。 我们在本文中介绍了将代码结构传达给变压器的几种方式,该方式从根本上构建了用于处理序列数据的方法。我们在标准的Python数据集以及Facebook内部Python语料库上对我们的建议以及替代设计选择提供了全面的实验评估。我们的代码和数据准备管道将在开源中提供。
We advance the state-of-the-art in the accuracy of code prediction (next token prediction) used in autocomplete systems. First, we report that using the recently proposed Transformer architecture even out-of-the-box outperforms previous neural and non-neural systems for code prediction. We then show that by making the Transformer architecture aware of the syntactic structure of code, we further increase the margin by which a Transformer-based system outperforms previous systems. With this, it outperforms the accuracy of an RNN-based system (similar to Hellendoorn et al. 2018) by 18.3%, the Deep3 system (Raychev et al 2016) by 14.1%, and an adaptation of Code2Seq (Alon et al., 2018) for code prediction by 14.4%. We present in the paper several ways of communicating the code structure to the Transformer, which is fundamentally built for processing sequence data. We provide a comprehensive experimental evaluation of our proposal, along with alternative design choices, on a standard Python dataset, as well as on a Facebook internal Python corpus. Our code and data preparation pipeline will be available in open source.