论文标题
广泛的互助信息,用于歧视聚类
Generalised Mutual Information for Discriminative Clustering
论文作者
论文摘要
在过去的十年中,最近的深度聚类成功涉及共同信息(MI),这是培训正规化增加的神经网络的无监督目标。尽管正规化的质量已在很大程度上进行了讨论以进行改进,但很少关注MI作为聚类目标的相关性。在本文中,我们首先强调了MI的最大化如何不会导致令人满意的簇。我们确定了kullback-leibler差异是这种行为的主要原因。因此,我们通过更改其核心距离,引入广泛的互信息(Gemini):一组无监督神经网络训练的指标来概括相互信息。与MI不同,有些双子座在训练时不需要正规化。这些指标中的一些是几何学意识,这要归功于数据空间中的距离或内核。最后,我们强调说,双子座可以自动选择相关数量的簇,这是在深度聚类的上下文中很少研究的属性,在此中,群集数量是先验的未知数。
In the last decade, recent successes in deep clustering majorly involved the mutual information (MI) as an unsupervised objective for training neural networks with increasing regularisations. While the quality of the regularisations have been largely discussed for improvements, little attention has been dedicated to the relevance of MI as a clustering objective. In this paper, we first highlight how the maximisation of MI does not lead to satisfying clusters. We identified the Kullback-Leibler divergence as the main reason of this behaviour. Hence, we generalise the mutual information by changing its core distance, introducing the generalised mutual information (GEMINI): a set of metrics for unsupervised neural network training. Unlike MI, some GEMINIs do not require regularisations when training. Some of these metrics are geometry-aware thanks to distances or kernels in the data space. Finally, we highlight that GEMINIs can automatically select a relevant number of clusters, a property that has been little studied in deep clustering context where the number of clusters is a priori unknown.