论文标题
查找可重复使用的机器学习组件来构建编程语言处理管道
Finding Reusable Machine Learning Components to Build Programming Language Processing Pipelines
论文作者
论文摘要
在过去的几年中,使用机器学习的编程语言处理(PLP)取得了广泛的改进。越来越多的人有兴趣探索这个有希望的领域。但是,鉴于要解决的不同PLP任务,发布的大量数据集和模型以及所涉及的复杂编译器或工具集,因此,新的研究人员和开发人员要找到合适的组件来构建自己的机器学习管道来构建自己的机器学习管道。为了改善机器学习组件的可发现性,可访问性,互操作性和可重复性(公平性),我们在基于机器学习的PLP领域中收集和分析了一组代表性论文。然后,我们确定并表征包括PLP任务,模型架构和支持工具在内的关键概念。最后,我们展示了利用可重复使用的组件来构建机器学习管道以解决一组PLP任务的一些示例用例。
Programming Language Processing (PLP) using machine learning has made vast improvements in the past few years. Increasingly more people are interested in exploring this promising field. However, it is challenging for new researchers and developers to find the right components to construct their own machine learning pipelines, given the diverse PLP tasks to be solved, the large number of datasets and models being released, and the set of complex compilers or tools involved. To improve the findability, accessibility, interoperability and reusability (FAIRness) of machine learning components, we collect and analyze a set of representative papers in the domain of machine learning-based PLP. We then identify and characterize key concepts including PLP tasks, model architectures and supportive tools. Finally, we show some example use cases of leveraging the reusable components to construct machine learning pipelines to solve a set of PLP tasks.