乌克兰文本中名词短语检测方法

论文标题

乌克兰文本中名词短语检测方法

Method of noun phrase detection in Ukrainian texts

论文作者

Pogorilyy, S. D., Kramov, A. A.

论文摘要

介绍。自然语言处理的领域考虑了无法使用传统算法动作解决的AI完整任务。这种任务通常是通过使用机器学习方法和计算机语言学手段来实施的。文本的预处理任务之一是搜索名词短语。这项任务的准确性对许多其他任务在自然语言处理领域的有效性有影响。尽管在自然语言处理领域进行了研究的积极发展，但乌克兰文本中寻找名词短语的调查仍处于早期阶段。结果。已经分析了名词短语检测的不同方法。代表句子作为树结构的权宜之计是合理的。许多名词短语检测方法的关键缺点是它们从某种语言的特征中检测到其有效性的严重依赖性。考虑到句子处理的统一格式以及为乌克兰文本建造句子树的训练模型的可用性，已选择了普遍的依赖模型。已经提出了利用普遍依赖性手段和命名实体识别模型的乌克兰文本中名词短语检测的复杂方法。对乌克兰新闻语料库的建议方法的有效性进行了实验验证。已经计算出不同的方法精度指标。结论。获得的结果可能表明建议的方法可用于在乌克兰文本中找到名词短语。该方法可以根据主题领域的适当命名 - 实体识别模型的使用准确性提高。

Introduction. The area of natural language processing considers AI-complete tasks that cannot be solved using traditional algorithmic actions. Such tasks are commonly implemented with the usage of machine learning methodology and means of computer linguistics. One of the preprocessing tasks of a text is the search of noun phrases. The accuracy of this task has implications for the effectiveness of many other tasks in the area of natural language processing. In spite of the active development of research in the area of natural language processing, the investigation of the search for noun phrases within Ukrainian texts are still at an early stage. Results. The different methods of noun phrases detection have been analyzed. The expediency of the representation of sentences as a tree structure has been justified. The key disadvantage of many methods of noun phrase detection is the severe dependence of the effectiveness of their detection from the features of a certain language. Taking into account the unified format of sentence processing and the availability of the trained model for the building of sentence trees for Ukrainian texts, the Universal Dependency model has been chosen. The complex method of noun phrases detection in Ukrainian texts utilizing Universal Dependencies means and named-entity recognition model has been suggested. Experimental verification of the effectiveness of the suggested method on the corpus of Ukrainian news has been performed. Different metrics of method accuracy have been calculated. Conclusions. The results obtained can indicate that the suggested method can be used to find noun phrases in Ukrainian texts. An accuracy increase of the method can be made with the usage of appropriate named-entity recognition models according to a subject area.

下载PDF全文

下载文献需遵守相关版权规定

论文标题