论文标题
部分可观测时空混沌系统的无模型预测
Clustering Categorical Data: Soft Rounding k-modes
论文作者
论文摘要
在过去的三十年中,研究人员深入探索了用于分类数据分析的各种聚类工具。尽管提出了各种聚类算法的建议,但经典的K-Modes算法仍然是无监督学习分类数据的流行选择。令人惊讶的是,我们的第一个见解是,在天然生成块模型中,K-Modes算法对于大量参数的性能较差。我们通过提出K-Modes算法(SoftModes)的软舍式变体来解决这个问题,从理论上证明我们的变体可以解决生成模型中K-Modes算法的缺点。最后,我们从经验上验证了SoftModes在合成和现实世界数据集上的性能很好。
Over the last three decades, researchers have intensively explored various clustering tools for categorical data analysis. Despite the proposal of various clustering algorithms, the classical k-modes algorithm remains a popular choice for unsupervised learning of categorical data. Surprisingly, our first insight is that in a natural generative block model, the k-modes algorithm performs poorly for a large range of parameters. We remedy this issue by proposing a soft rounding variant of the k-modes algorithm (SoftModes) and theoretically prove that our variant addresses the drawbacks of the k-modes algorithm in the generative model. Finally, we empirically verify that SoftModes performs well on both synthetic and real-world datasets.