论文标题
程序员如何使用原始数据类型表达高级概念?
How Do Programmers Express High-Level Concepts using Primitive Data Types?
论文作者
论文摘要
我们研究了程序员如何使用原始数据类型表达高级概念,例如路径名和坐标。尽管过多地依赖原始数据类型有时会被批评为一种难闻的气味,但它仍然是程序员中的普遍做法。我们提出了一种新颖的方式,可以通过检查API调用来准确地识别某些预定义概念的表达方式。我们定义了Java标准API中使用的十二种概念类型。然后,我们从26个开源项目中获得了每种概念类型的表达式。根据获得的表达式,我们培训了基于决策树的分类器。它达到了83%的F评分,以正确预测给定表达式的概念类型。我们的结果表明,一旦给出了足够的示例,就可以从源代码中推断出概念类型。获得的分类器可用于潜在的错误检测,测试案例生成和文档。
We investigated how programmers express high-level concepts such as path names and coordinates using primitive data types. While relying too much on primitive data types is sometimes criticized as a bad smell, it is still a common practice among programmers. We propose a novel way to accurately identify expressions for certain predefined concepts by examining API calls. We defined twelve conceptual types used in the Java Standard API. We then obtained expressions for each conceptual type from 26 open source projects. Based on the expressions obtained, we trained a decision tree-based classifier. It achieved 83% F-score for correctly predicting the conceptual type for a given expression. Our result indicates that it is possible to infer a conceptual type from a source code reasonably well once enough examples are given. The obtained classifier can be used for potential bug detection, test case generation and documentation.