论文标题
技术和社会因素对NPM生态系统的吸引请求质量的影响
Effect of Technical and Social Factors on Pull Request Quality for the NPM Ecosystem
论文作者
论文摘要
基于拉的请求(PR)的开发是社会编码平台的规范,这是评估来自开源生态系统的开发人员的贡献的挑战,并相反,与不熟悉的维护者一起向项目提交了贡献。先前的研究表明,接受或拒绝PR的决定可能会受到一组不同的技术和社会因素的影响,但通常集中于相对较少的项目,不考虑范围内生态系统的措施,或者预测因子和PR接受概率之间可能的非单调关系。我们的目的是通过测试测量大量生态系统的大部分PR接受的可能性来阐明这一重要的决策过程,并以它们在预测PR接受方面的相对重要性进行排名,并确定映射每个预测因子以接受PR接受的功能的形状。我们提出了七个假设,即哪种技术和社会因素可能会影响公关的接受,并基于它们制定了17种措施。我们的数据集由470,925个PR组成,来自3349个流行的NPM软件包和创建这些的79,128个GitHub用户。我们测试了哪些措施会影响公关的接受度,并通过在预测模型中的重要性进行了重大措施。我们的预测模型的AUC为0.94,发现17个措施中有15个重要,其中包括五种新的生态系统范围。描述提交给存储库的PR数量以及被接受的部分的PR数量以及有关PR审查阶段的信号的措施是最重要的。我们还发现,只有四个预测因子对PR的接受概率有线性影响,而其他预测因素显示出更为复杂的响应。
Pull request (PR) based development, which is a norm for the social coding platforms, entails the challenge of evaluating the contributions of, often unfamiliar, developers from across the open source ecosystem and, conversely, submitting a contribution to a project with unfamiliar maintainers. Previous studies suggest that the decision of accepting or rejecting a PR may be influenced by a diverging set of technical and social factors, but often focus on relatively few projects, do not consider ecosystem-wide measures, or the possible non-monotonic relationships between the predictors and PR acceptance probability. We aim to shed light on this important decision making process by testing which measures significantly affect the probability of PR acceptance on a significant fraction of a large ecosystem, rank them by their relative importance in predicting PR acceptance, and determine the shape of the functions that map each predictor to PR acceptance. We proposed seven hypotheses regarding which technical and social factors might affect PR acceptance and created 17 measures based on them. Our dataset consisted of 470,925 PRs from 3349 popular NPM packages and 79,128 GitHub users who created those. We tested which of the measures affect PR acceptance and ranked the significant measures by their importance in a predictive model. Our predictive model had and AUC of 0.94, and 15 of the 17 measures were found to matter, including five novel ecosystem-wide measures. Measures describing the number of PRs submitted to a repository and what fraction of those get accepted, and signals about the PR review phase were most significant. We also discovered that only four predictors have a linear influence on the PR acceptance probability while others showed a more complicated response.