机器编程系统是否使用正确的源代码测量来选择代码存储库？

论文标题

机器编程系统是否使用正确的源代码测量来选择代码存储库？

Are Machine Programming Systems using Right Source-Code Measures to Select Code Repositories?

论文作者

Hasabnis, Niranjan

论文摘要

机器编程（MP）是确定性和概率计算交集的新兴领域，它旨在协助软件和硬件工程师以及其他应用程序。除了强大的计算资源外，MP系统通常依靠大量的开源代码来学习有关代码和编程的有趣属性，并在调试，代码建议，自动完成等领域中解决问题。不幸的是，现有的几个现有MP系统不考虑代码存储库的质量或使用非典型的质量质量度量，或者在适用的质量质量范围内使用了与软件互为典型的质量测量。因此，需要研究代码存储库质量对这些系统性能的影响。在这篇初步论文中，我们评估了不同质量存储库对候选MP系统性能的影响。为了实现这一目标，我们开发了一个名为Gitrank的框架，以利用有关此主题的现有研究来对开源存储库进行对质量，可维护性和受欢迎程度的排名。然后，我们应用Gitrank来评估候选MP系统使用的质量度量与我们的框架使用的质量度量之间的相关性。我们的初步结果揭示了Gitrank中使用的质量度量与ControlFlag的性能之间的某些相关性，这表明Gitrank中使用的某些措施适用于ControlFlag。但这也提出了有关MP系统中使用的代码存储库正确质量措施的问题。我们认为，我们的发现还会对影响MP系统性能的代码质量度量产生有趣的见解。

Machine programming (MP) is an emerging field at the intersection of deterministic and probabilistic computing, and it aims to assist software and hardware engineers, among other applications. Along with powerful compute resources, MP systems often rely on vast amount of open-source code to learn interesting properties about code and programming and solve problems in the areas of debugging, code recommendation, auto-completion, etc. Unfortunately, several of the existing MP systems either do not consider quality of code repositories or use atypical quality measures than those typically used in software engineering community to select them. As such, impact of quality of code repositories on the performance of these systems needs to be studied. In this preliminary paper, we evaluate impact of different quality repositories on the performance of a candidate MP system. Towards that objective, we develop a framework, named GitRank, to rank open-source repositories on quality, maintainability, and popularity by leveraging existing research on this topic. We then apply GitRank to evaluate correlation between the quality measures used by the candidate MP system and the quality measures used by our framework. Our preliminary results reveal some correlation between the quality measures used in GitRank and ControlFlag's performance, suggesting that some of the measures used in GitRank are applicable to ControlFlag. But it also raises questions around right quality measures for code repositories used in MP systems. We believe that our findings also generate interesting insights towards code quality measures that affect performance of MP systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题