基于单词嵌入的文本处理，用于全面汇总和不同的信息提取

论文标题

基于单词嵌入的文本处理，用于全面汇总和不同的信息提取

Word Embedding-based Text Processing for Comprehensive Summarization and Distinct Information Extraction

论文作者

Wan, Xiangpeng, Ghazzai, Hakim, Massoud, Yehia

论文摘要

在本文中，我们提出了两个专门设计用于分析在线评论的自动文本处理框架。第一个框架的目的是通过提取基本句子来汇总评论数据集。这是通过将句子转换为数值向量并使用社区检测算法基于它们的相似性级别将其聚集的。之后，为每个句子测量一个相关得分，以确定其在每个集群中的重要性水平，并将其分配为该社区的标签。第二个框架是基于提问的神经网络模型，该模型训练有素，可以提取多个不同问题的答案。有效地聚集了收集的答案，以找到对客户可能要求的单个问题的多个不同答案。所提出的框架比现有评论处理解决方案更全面。

In this paper, we propose two automated text processing frameworks specifically designed to analyze online reviews. The objective of the first framework is to summarize the reviews dataset by extracting essential sentence. This is performed by converting sentences into numerical vectors and clustering them using a community detection algorithm based on their similarity levels. Afterwards, a correlation score is measured for each sentence to determine its importance level in each cluster and assign it as a tag for that community. The second framework is based on a question-answering neural network model trained to extract answers to multiple different questions. The collected answers are effectively clustered to find multiple distinct answers to a single question that might be asked by a customer. The proposed frameworks are shown to be more comprehensive than existing reviews processing solutions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题