COMStreamClust：流媒体数据中文本聚类的交流多代理方法

论文标题

COMStreamClust：流媒体数据中文本聚类的交流多代理方法

ComStreamClust: a communicative multi-agent approach to text clustering in streaming data

论文作者

Najafi, Ali, Gholipour-Shilabin, Araz, Dehkharghani, Rahim, Mohammadpur-Fard, Ali, Asgari-Chenaghlu, Meysam

论文摘要

主题检测是确定和跟踪社交媒体中热门话题的任务。 Twitter可以说是人们与他人分享有关不同问题的想法的最受欢迎的平台。一个普遍的问题是19009年大流行。检测和跟踪有关此类问题的主题将有助于政府和医疗保健公司处理这种现象。在本文中，我们提出了一种新颖，多代理的交流聚类方法，即所谓的ComStreamclust，用于在更广泛的主题中，例如Covid-19。所提出的方法是可行的，并且可以同时处理几个数据点。 Labse句子嵌入用于测量两个推文之间的语义相似性。在两个数据集上对COMStreamClust进行了评估：COVID-19和FA杯。与现有方法相比，从comstreamclust获得的结果批准了拟议方法的有效性。

Topic detection is the task of determining and tracking hot topics in social media. Twitter is arguably the most popular platform for people to share their ideas with others about different issues. One such prevalent issue is the COVID-19 pandemic. Detecting and tracking topics on these kinds of issues would help governments and healthcare companies deal with this phenomenon. In this paper, we propose a novel, multi-agent, communicative clustering approach, so-called ComStreamClust for clustering sub-topics inside a broader topic, e.g., COVID-19. The proposed approach is parallelizable, and can simultaneously handle several data-point. The LaBSE sentence embedding is used to measure the semantic similarity between two tweets. ComStreamClust has been evaluated on two datasets: the COVID-19 and the FA CUP. The results obtained from ComStreamClust approve the effectiveness of the proposed approach when compared to existing methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题