论文标题

DRBM-Clustnet:用于数据集群的深度限制的Boltzmann-Kohonen架构

DRBM-ClustNet: A Deep Restricted Boltzmann-Kohonen Architecture for Data Clustering

论文作者

Senthilnath, J., G, Nagaraj, C, Sumanth Simha, Kulkarni, Sushant, Thapa, Meenakumari, M, Indiramma, Benediktsson, Jón Atli

论文摘要

提出了一种被称为DRBM-Clustnet的数据聚类的贝叶斯深度限制的玻璃体boltzmann-kohonen架构。该核心聚类引擎由深度限制的Boltzmann机器(DRBM)组成,用于通过创建不相关且相互差异很大的新功能来处理未标记的数据。接下来,使用贝叶斯信息标准(BIC)预测集群的数量,然后是基于Kohonen网络的聚类层。未标记数据的处理是在三个阶段进行的,以有效地聚类非线性可分离的数据集。在第一阶段,DRBM通过将$ d $ dimensions的功能向量投射到$ n $ dimensions中来捕获高度复杂的数据表示,从而执行非线性功能提取。大多数聚类算法都要求将簇数确定为先验,因此在这里可以自动化我们使用BIC的第二阶段的簇数。在第三阶段,从BIC得出的簇数构成了Kohonen网络的输入,该输入可以执行从DRBM获得的特征提取数据的聚类。该方法克服了聚类算法的一般缺点,例如簇数的先前规范,与非线性数据集中的局部optima的收敛和差的聚类准确性。在这项研究中,我们使用两个合成数据集,来自UCI机器学习存储库中的15个基准数据集和四个图像数据集分析DRBM-ClustNet。根据聚类的精度评估了所提出的框架,并根据其他最先进的聚类方法进行排名。获得的结果表明,DRBM-Clustnet优于最先进的聚类算法。

A Bayesian Deep Restricted Boltzmann-Kohonen architecture for data clustering termed as DRBM-ClustNet is proposed. This core-clustering engine consists of a Deep Restricted Boltzmann Machine (DRBM) for processing unlabeled data by creating new features that are uncorrelated and have large variance with each other. Next, the number of clusters are predicted using the Bayesian Information Criterion (BIC), followed by a Kohonen Network-based clustering layer. The processing of unlabeled data is done in three stages for efficient clustering of the non-linearly separable datasets. In the first stage, DRBM performs non-linear feature extraction by capturing the highly complex data representation by projecting the feature vectors of $d$ dimensions into $n$ dimensions. Most clustering algorithms require the number of clusters to be decided a priori, hence here to automate the number of clusters in the second stage we use BIC. In the third stage, the number of clusters derived from BIC forms the input for the Kohonen network, which performs clustering of the feature-extracted data obtained from the DRBM. This method overcomes the general disadvantages of clustering algorithms like the prior specification of the number of clusters, convergence to local optima and poor clustering accuracy on non-linear datasets. In this research we use two synthetic datasets, fifteen benchmark datasets from the UCI Machine Learning repository, and four image datasets to analyze the DRBM-ClustNet. The proposed framework is evaluated based on clustering accuracy and ranked against other state-of-the-art clustering methods. The obtained results demonstrate that the DRBM-ClustNet outperforms state-of-the-art clustering algorithms.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源