论文标题

乌克兰文本连贯评估的方法

Method of the coherence evaluation of Ukrainian text

论文作者

Pogorilyy, S. D., Kramov, A. A.

论文摘要

由于SEO技术的作用不断增长,因此有必要对文章的质量进行自动分析。这种方法有助于返回用户查询最清晰的页面,并将网站位置提高到查询结果的顶部。对连贯性的自动评估是文本复杂分析的一部分。在本文中,分析了乌克兰语言的文本相干测量的主要方法。与其他方法相比,使用语义相似性图方法的权宜之计。建议通过对句子向量表示神经网络的预训练来改善该方法。对原始方法及其修改进行了实验检查。对乌克兰文本语料库进行了培训和检查程序,这些文本以前是从乌克兰科学文章的摘要和全文中检索出来的。测试过程是通过执行两个典型任务进行文本连贯评估来实现的:文档歧视任务和插入任务。因此,在分析中,它被定义了方法修改的最有效组合及其用于测量文本相干性的参数。

Due to the growing role of the SEO technologies, it is necessary to perform an automated analysis of the article's quality. Such approach helps both to return the most intelligible pages for the user's query and to raise the web sites positions to the top of query results. An automated assessment of a coherence is a part of the complex analysis of the text. In this article, main methods for text coherence measurements for Ukrainian language are analyzed. Expediency of using the semantic similarity graph method in comparison with other methods are explained. It is suggested the improvement of that method by the pre-training of the neural network for vector representations of sentences. Experimental examination of the original method and its modifications is made. Training and examination procedures are made on the corpus of Ukrainian texts, which were previously retrieved from abstracts and full texts of Ukrainian scientific articles. The testing procedure is implemented by performing of two typical tasks for the text coherence assessment: document discrimination task and insertion task. Accordingly to the analysis it is defined the most effective combination of method's modification and its parameter for the measurement of the text coherence.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源