论文标题
通过时间和内存优化改进的多目标数据流聚类
Improved Multi-objective Data Stream Clustering with Time and Memory Optimization
论文作者
论文摘要
由于传感器,社交媒体等,在过去的几十年中,对数据流的分析受到了广泛的关注。它旨在识别无序,无限和不断发展的观测流中的模式。聚类此类数据需要时间和内存限制。本文介绍了一种新的数据流聚类方法(IMOC-stream)。与其他聚类算法不同,该方法使用两个不同的目标函数来捕获数据的不同方面。 IMOC流的目的是:1)通过使用空闲时间应用遗传操作并增强解决方案来减少计算时间。 2)通过引入新的树概要来减少内存分配。 3)使用多目标框架找到任意形状的簇。我们对高维流数据集进行了一项实验研究,并将其与众所周知的流聚类技术进行了比较。该实验表明了我们方法以任意形状,紧凑和分离的群集在优化时间和内存的同时,以任意形状,紧凑和分离的群集对数据流进行划分的能力。根据NMI和ARAND测量,我们的方法还优于大多数流算法。
The analysis of data streams has received considerable attention over the past few decades due to sensors, social media, etc. It aims to recognize patterns in an unordered, infinite, and evolving stream of observations. Clustering this type of data requires some restrictions in time and memory. This paper introduces a new data stream clustering method (IMOC-Stream). This method, unlike the other clustering algorithms, uses two different objective functions to capture different aspects of the data. The goal of IMOC-Stream is to: 1) reduce computation time by using idle times to apply genetic operations and enhance the solution. 2) reduce memory allocation by introducing a new tree synopsis. 3) find arbitrarily shaped clusters by using a multi-objective framework. We conducted an experimental study with high dimensional stream datasets and compared them to well-known stream clustering techniques. The experiments show the ability of our method to partition the data stream in arbitrarily shaped, compact, and well-separated clusters while optimizing the time and memory. Our method also outperformed most of the stream algorithms in terms of NMI and ARAND measures.