快速多视群集的自适应加权积分空间

论文标题

快速多视群集的自适应加权积分空间

Adaptively-weighted Integral Space for Fast Multiview Clustering

论文作者

Chen, Man-Sheng, Liu, Tuo, Wang, Chang-Dong, Huang, Dong, Lai, Jian-Huang

论文摘要

多视图聚类已进行了广泛的研究，以利用多源信息来提高聚类性能。通常，大多数现有作品通常通过某些相似性/距离指标（例如欧几里得距离）或学习的表示形式来计算N * n亲和力图，并探索跨视图的成对相关性。但是不幸的是，通常需要二次甚至立方复杂性，这使得在集群LargesCale数据集方面遇到了困难。最近，通过选择具有K-均值的视图锚表示形式或对原始观测值的直接矩阵分解来捕获多个视图中的数据分布。尽管取得了巨大的成功，但很少有人考虑了视图不足问题，因此隐含地认为，每个单独的观点都足以恢复群集结构。此外，无法同时发现潜在的积分空间以及来自多个视图的共享集群结构。鉴于这一点，我们为快速多浏览聚类（AIMC）提出了一个具有几乎线性复杂性的快速多视图聚类（AIMC）。具体而言，视图生成模型旨在重建具有不同自适应贡献的潜在积分空间的视图观测值。同时，无缝构造具有正交性约束和群集分区的质心表示，以近似潜在的积分空间。开发了一种最小化算法来解决优化问题，事实证明，该算法具有线性时间复杂性W.R.T.样本量。与最新方法相比，在几个REALWORLD数据集上进行的广泛实验证实了所提出的AIMC方法的优越性。

Multiview clustering has been extensively studied to take advantage of multi-source information to improve the clustering performance. In general, most of the existing works typically compute an n * n affinity graph by some similarity/distance metrics (e.g. the Euclidean distance) or learned representations, and explore the pairwise correlations across views. But unfortunately, a quadratic or even cubic complexity is often needed, bringing about difficulty in clustering largescale datasets. Some efforts have been made recently to capture data distribution in multiple views by selecting view-wise anchor representations with k-means, or by direct matrix factorization on the original observations. Despite the significant success, few of them have considered the view-insufficiency issue, implicitly holding the assumption that each individual view is sufficient to recover the cluster structure. Moreover, the latent integral space as well as the shared cluster structure from multiple insufficient views is not able to be simultaneously discovered. In view of this, we propose an Adaptively-weighted Integral Space for Fast Multiview Clustering (AIMC) with nearly linear complexity. Specifically, view generation models are designed to reconstruct the view observations from the latent integral space with diverse adaptive contributions. Meanwhile, a centroid representation with orthogonality constraint and cluster partition are seamlessly constructed to approximate the latent integral space. An alternate minimizing algorithm is developed to solve the optimization problem, which is proved to have linear time complexity w.r.t. the sample size. Extensive experiments conducted on several realworld datasets confirm the superiority of the proposed AIMC method compared with the state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题