论文标题
混合一致的深群集
Mixing Consistent Deep Clustering
论文作者
论文摘要
在数据中找到定义明确的群集代表了许多数据驱动的应用程序的基本挑战,并且很大程度上取决于良好的数据表示。研究文献有关表示学习的文献,研究表明,良好潜在表示的一个关键特征是在解码两个潜在表示的线性插值时能够产生语义混合的输出。我们提出了混合一致的深群集方法,该方法鼓励插值显现现实,同时添加了一个约束,即两个数据点的插值必须看起来像两个输入之一。通过将这种训练方法应用于各种聚类(非)特定的自动编码器模型,我们发现使用所提出的训练方法系统地改变了模型的学习成表示的结构,并改善了在MNIST,SVHN和CIFAR-100数据集中测试过的ACAI,IDEC和VAE模型的聚类性能。这些结果对众多现实世界聚类任务具有实际含义,因为它表明可以将提出的方法添加到现有的自动编码器中,以进一步提高聚类性能。
Finding well-defined clusters in data represents a fundamental challenge for many data-driven applications, and largely depends on good data representation. Drawing on literature regarding representation learning, studies suggest that one key characteristic of good latent representations is the ability to produce semantically mixed outputs when decoding linear interpolations of two latent representations. We propose the Mixing Consistent Deep Clustering method which encourages interpolations to appear realistic while adding the constraint that interpolations of two data points must look like one of the two inputs. By applying this training method to various clustering (non-)specific autoencoder models we found that using the proposed training method systematically changed the structure of learned representations of a model and it improved clustering performance for the tested ACAI, IDEC, and VAE models on the MNIST, SVHN, and CIFAR-10 datasets. These outcomes have practical implications for numerous real-world clustering tasks, as it shows that the proposed method can be added to existing autoencoders to further improve clustering performance.