用于软件努力估算的异质图神经网络

论文标题

用于软件努力估算的异质图神经网络

Heterogeneous Graph Neural Networks for Software Effort Estimation

论文作者

Phan, Hung, Jannesari, Ali

论文摘要

可以通过故事点[35]来衡量软件工作。当前的方法是自动估计故事点的方法，重点是应用预训练的嵌入模型和文本回归的深度学习来解决此问题，该问题需要昂贵的嵌入模型。我们提出了HeteroSP，该工具是从敏捷软件项目问题的文本输入中估算故事点的工具。我们选择GPT2SP [12]和Deep-Se [8]作为比较的基准。首先，从对故事点数据集的分析[8]，我们得出结论，软件问题实际上是自然语言句子与引用代码片段的混合，并且存在与大尺寸词汇有关的问题。其次，我们提供了一个模块，以将输入文本归一化，包括软件问题的单词和代码令牌。第三，我们设计了一种算法，将输入软件问题转换为具有不同类型的节点和边缘的图形。第四，我们在FastText [6]的支持下构建了一个异质图神经网络模型，用于构建初始节点嵌入以学习和预测新问题的故事点。我们对三种估计方案进行了比较，包括在项目中，存储库中的横向项目以及我们的基线方法进行比较。对于三种情况，我们将平均平均绝对误差（MAE）作为2.38、2.61和2.63。在最具挑战性的场景中，我们的运行时间少于最具挑战性的方案，在2/3的情况下，我们的表现优于GPT2SP。我们还将方法与不同的均匀图神经网络模型进行了比较，结果表明，异质图神经网络模型在故事点估计中优于均匀模型。对于时间性能，我们在这两个过程中的时间性能中达到了约570秒：嵌入初始化，模型构建和故事点估计。

Software effort can be measured by story point [35]. Current approaches for automatically estimating story points focus on applying pre-trained embedding models and deep learning for text regression to solve this problem which required expensive embedding models. We propose HeteroSP, a tool for estimating story points from textual input of Agile software project issues. We select GPT2SP [12] and Deep-SE [8] as the baselines for comparison. First, from the analysis of the story point dataset [8], we conclude that software issues are actually a mixture of natural language sentences with quoted code snippets and have problems related to large-size vocabulary. Second, we provide a module to normalize the input text including words and code tokens of the software issues. Third, we design an algorithm to convert an input software issue to a graph with different types of nodes and edges. Fourth, we construct a heterogeneous graph neural networks model with the support of fastText [6] for constructing initial node embedding to learn and predict the story points of new issues. We did the comparison over three scenarios of estimation, including within project, cross-project within the repository, and cross-project cross repository with our baseline approaches. We achieve the average Mean Absolute Error (MAE) as 2.38, 2.61, and 2.63 for three scenarios. We outperform GPT2SP in 2/3 of the scenarios while outperforming Deep-SE in the most challenging scenario with significantly less amount of running time. We also compare our approaches with different homogeneous graph neural network models and the results show that the heterogeneous graph neural networks model outperforms the homogeneous models in story point estimation. For time performance, we achieve about 570 seconds as the time performance in both three processes: node embedding initialization, model construction, and story point estimation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题