论文标题

偏斜的分布或转换?建模偏度以进行集群分析

Skewed Distributions or Transformations? Modelling Skewness for a Cluster Analysis

论文作者

Gallaugher, Michael P. B., McNicholas, Paul D., Melnykov, Volodymyr, Zhu, Xuwen

论文摘要

由于其数学障碍性,高斯混合模型在文献中占据了聚类和分类的特殊位置。但是,对于所有好处,高斯混合模型在数据偏斜或包含异常值时会带来问题。因此,多年来开发了处理偏斜数据的方法,并分为两个一般类别。首先是考虑更灵活的偏斜分布的混合物,第二个是基于结合近正态性的转换。尽管已经在各自的论文中比较了这些方法,但尚未进行详细的比较,以确定一种方法何时可能比另一种方法更合适。本文中,我们提供了许多基准测试数据集的详细比较,并描述了一种评估群集分离的新方法。

Because of its mathematical tractability, the Gaussian mixture model holds a special place in the literature for clustering and classification. For all its benefits, however, the Gaussian mixture model poses problems when the data is skewed or contains outliers. Because of this, methods have been developed over the years for handling skewed data, and fall into two general categories. The first is to consider a mixture of more flexible skewed distributions, and the second is based on incorporating a transformation to near normality. Although these methods have been compared in their respective papers, there has yet to be a detailed comparison to determine when one method might be more suitable than the other. Herein, we provide a detailed comparison on many benchmarking datasets, as well as describe a novel method to assess cluster separation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源