论文标题
关于与数学公式有关的多样性和频率在现实世界中的Java项目中
On the diversity and frequency of code related to mathematical formulas in real-world Java projects
论文作者
论文摘要
在本文中,术语公式代码是指实现数学公式的源代码的片段。我们介绍了经验研究,分析了开源软件项目中配方代码的多样性和频率。在一项探索性研究中,我们研究了在现实世界中的Java项目以及衍生的句法模式和约束中实施了哪些公式。我们将这些模式用于总和和产品公式,以自动检测软件档案中的公式代码,并在数学符号中重建实现的公式。在对GitHub上的大量工程Java项目样本的定量研究中,我们分析了公式代码的频率,并估计该样本中的700行代码之一实现了总和或产品公式。对于科学计算项目的样本,我们发现100行代码中的一条实现了总和或产品公式。为了评估工具支持的需求,我们调查了评论的有用性,以在公式代码片段的样本中进行程序理解,并进行了在线调查。我们的发现提供了对公式代码特征的首先见解,这些特征可以激发有关公式代码在软件项目中的作用以及与公式相关工具的设计的进一步研究。
In this paper, the term formula code refers to fragments of source code that implement a mathematical formula. We present empirical studies that analyze the diversity and frequency of formula code in open-source-software projects. In an exploratory study, we investigated what kinds of formulas are implemented in real-world Java projects and derived syntactical patterns and constraints. We refined these patterns for sum and product formulas to automatically detect formula code in software archives and to reconstruct the implemented formula in mathematical notation. In a quantitative study of a large sample of engineered Java projects on GitHub we analyzed the frequency of formula code and estimated that one of 700 lines of code in this sample implements a sum or product formula. For a sample of scientific-computing projects, we found that one of 100 lines of code implements a sum or product formula. To assess the need for tool support, we investigated the helpfulness of comments for program understanding in a sample of formula-code fragments and performed an online survey. Our findings provide first insights into the characteristics of formula code, that can motivate further studies on the role of formula code in software projects and the design of formula-related tools.