论文标题

VDPC:变分密度峰值聚类算法

VDPC: Variational Density Peak Clustering Algorithm

论文作者

Wang, Yizhang, Wang, Di, Zhou, You, Zhang, Xiaofeng, Quek, Chai

论文摘要

广泛应用的密度峰值聚类(DPC)算法使群集中心通常被局部密度较低的数据点所包围,并且远离局部密度较高的其他数据点。但是,这种假设受到一个限制,即在识别密度较低的簇时通常会有问题,因为它们很容易合并到密度较高的其他群集中。结果,DPC可能无法识别具有变化密度的簇。为了解决此问题,我们提出了一个变分密度峰值聚类(VDPC)算法,该算法旨在在具有各种密度分布的数据集上进行系统地和自主执行群集任务。具体而言,我们首先提出了一种新的方法,以确定所有数据点之间的代表,并基于确定的代表来构建初始集群,以进一步分析簇的属性。此外,我们根据局部密度将所有数据点分为不同的水平,并通过结合DPC和DBSCAN的优势来提出一个统一的聚类框架。因此,系统处理的所有鉴定出的初始簇均分布在不同的密度水平上,以形成最终的簇。为了评估所提出的VDPC算法的有效性,我们使用20个数据集进行了广泛的实验,包括八个合成,六个现实世界和六个图像数据集。实验结果表明,VDPC的表现优于两种经典算法(即DPC和DBSCAN)和四种最新的扩展DPC算法。

The widely applied density peak clustering (DPC) algorithm makes an intuitive cluster formation assumption that cluster centers are often surrounded by data points with lower local density and far away from other data points with higher local density. However, this assumption suffers from one limitation that it is often problematic when identifying clusters with lower density because they might be easily merged into other clusters with higher density. As a result, DPC may not be able to identify clusters with variational density. To address this issue, we propose a variational density peak clustering (VDPC) algorithm, which is designed to systematically and autonomously perform the clustering task on datasets with various types of density distributions. Specifically, we first propose a novel method to identify the representatives among all data points and construct initial clusters based on the identified representatives for further analysis of the clusters' property. Furthermore, we divide all data points into different levels according to their local density and propose a unified clustering framework by combining the advantages of both DPC and DBSCAN. Thus, all the identified initial clusters spreading across different density levels are systematically processed to form the final clusters. To evaluate the effectiveness of the proposed VDPC algorithm, we conduct extensive experiments using 20 datasets including eight synthetic, six real-world and six image datasets. The experimental results show that VDPC outperforms two classical algorithms (i.e., DPC and DBSCAN) and four state-of-the-art extended DPC algorithms.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源