使用基于NLP的文本表示技术来支持需求工程任务：系统映射审查

论文标题

使用基于NLP的文本表示技术来支持需求工程任务：系统映射审查

The Use of NLP-Based Text Representation Techniques to Support Requirement Engineering Tasks: A Systematic Mapping Review

论文作者

Sonbol, Riad, Rebdawi, Ghaida, Ghneim, Nada

论文摘要

自然语言处理（NLP）被广泛用于支持不同需求工程（RE）任务的自动化。大多数提出的方法始于各种NLP步骤，分析需求语句，提取其语言信息并将其转换为易于制作的表示形式，例如功能列表或基于嵌入式的向量表示。这些基于NLP的表示通常在以后阶段用作机器学习技术或基于规则的方法的输入。因此，需求表示在确定不同方法的准确性方面起着重要作用。在本文中，我们以系统文献映射（分类）的形式进行了一项调查，以找出（1）RE任务文献中使用的表示是什么，（2）这些作品的主要重点是什么，（3）该域中的主要研究方向是什么，以及（4）什么是什么差距和潜在的未来方向。在编译了2227篇论文的初始池并应用一组包含/排除标准后，我们获得了一个包含104篇相关论文的最终池。我们的调查表明，研究方向从使用词汇和句法特征转变为使用高级嵌入技术，尤其是在过去两年中。使用高级嵌入表示形式已证明其在大多数RE任务中的有效性（例如需求分析，从评论和论坛中提取要求以及语义级别的质量任务）。但是，基于词汇和句法特征的表示形式更适合其他RE任务（例如建模和语法级质量任务），因为它们为处理这些任务时使用的规则和正则表达式提供了所需的信息。此外，我们确定了现有文献中的四个差距，为什么重要以及未来的研究如何开始解决这些问题。

Natural Language Processing (NLP) is widely used to support the automation of different Requirements Engineering (RE) tasks. Most of the proposed approaches start with various NLP steps that analyze requirements statements, extract their linguistic information, and convert them to easy-to-process representations, such as lists of features or embedding-based vector representations. These NLP-based representations are usually used at a later stage as inputs for machine learning techniques or rule-based methods. Thus, requirements representations play a major role in determining the accuracy of different approaches. In this paper, we conducted a survey in the form of a systematic literature mapping (classification) to find out (1) what are the representations used in RE tasks literature, (2) what is the main focus of these works, (3) what are the main research directions in this domain, and (4) what are the gaps and potential future directions. After compiling an initial pool of 2,227 papers, and applying a set of inclusion/exclusion criteria, we obtained a final pool containing 104 relevant papers. Our survey shows that the research direction has changed from the use of lexical and syntactic features to the use of advanced embedding techniques, especially in the last two years. Using advanced embedding representations has proved its effectiveness in most RE tasks (such as requirement analysis, extracting requirements from reviews and forums, and semantic-level quality tasks). However, representations that are based on lexical and syntactic features are still more appropriate for other RE tasks (such as modeling and syntax-level quality tasks) since they provide the required information for the rules and regular expressions used when handling these tasks. In addition, we identify four gaps in the existing literature, why they matter, and how future research can begin to address them.

下载PDF全文

下载文献需遵守相关版权规定

论文标题