论文标题

CONIVAT:与部分背景知识的集群趋势评估和聚类

ConiVAT: Cluster Tendency Assessment and Clustering with Partial Background Knowledge

论文作者

Rathore, Punit, Bezdek, James C., Santi, Paolo, Ratti, Carlo

论文摘要

增值税方法是一种可视化技术,用于确定数值数据中潜在的群集结构和可能的簇数量。其改进的版本IVAT使用基于路径的距离变换来提高增值税对“强”案例的有效性。增值税和IVAT也已与单个链接(SL)分层聚类算法一起使用。但是,它们对数据集中簇之间的噪声和桥点很敏感,因此,对于这种情况,相应的增值税/IVAT图像通常是有序的。在本文中,我们提出了一种基于约束的IVAT版本,我们称之为Conivat,该版本以约束形式使用背景知识,以改善增值税/IVAT,以挑战和复杂的数据集。 Conivat使用输入约束来学习潜在的相似性度量,并在将增值税应用于其之前构建最小的传递差异矩阵。我们证明了在九个数据集上进行视觉评估和单个链接聚类的Conivat方法,以表明,它改善了复杂数据集的IVAT图像的质量,并且还克服了由于簇之间的“噪声”桥梁而引起的SL聚类的限制。在九个数据集上进行的广泛实验结果表明,在提高的聚类准确性方面,Conivat优于其他三种半监督聚类算法。

The VAT method is a visual technique for determining the potential cluster structure and the possible number of clusters in numerical data. Its improved version, iVAT, uses a path-based distance transform to improve the effectiveness of VAT for "tough" cases. Both VAT and iVAT have also been used in conjunction with a single-linkage(SL) hierarchical clustering algorithm. However, they are sensitive to noise and bridge points between clusters in the dataset, and consequently, the corresponding VAT/iVAT images are often in-conclusive for such cases. In this paper, we propose a constraint-based version of iVAT, which we call ConiVAT, that makes use of background knowledge in the form of constraints, to improve VAT/iVAT for challenging and complex datasets. ConiVAT uses the input constraints to learn the underlying similarity metric and builds a minimum transitive dissimilarity matrix, before applying VAT to it. We demonstrate ConiVAT approach to visual assessment and single linkage clustering on nine datasets to show that, it improves the quality of iVAT images for complex datasets, and it also overcomes the limitation of SL clustering with VAT/iVAT due to "noisy" bridges between clusters. Extensive experiment results on nine datasets suggest that ConiVAT outperforms the other three semi-supervised clustering algorithms in terms of improved clustering accuracy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源