论文标题
分层集群的价格
The Price of Hierarchical Clustering
论文作者
论文摘要
分层聚类是了解数据集的遗传性能的流行工具。这样的聚类实际上是一系列聚类的序列,从琐碎的聚类开始,在该聚类中,每个数据点形成自己的群集,然后连续合并两个现有簇,直到所有点在同一群集中为止。如果每个$ k $ clustering在层次结构中的成本最多是$α$ the Optimal $ k $ coltustering的成本,则分层聚类的近似值为$α$。我们研究任何集群($ k $ - 中心问题)的最大(离散)半径和任何群集的最大直径($ k $ - 直径问题)。通常,最佳聚类不会形成层次结构,因此无法实现$ 1 $的近似因素。我们称之为最小的近似因素,可以在任何情况下达到层次结构的价格。对于$ k $直径的问题,我们将层次结构价格的上限提高到$ 3+2 \ sqrt {2} \ of 5.83 $。此外,我们大大改善了$ k $ center和$ k $ dimeter的下限,证明了分别为$ 4 $和$ 3+2 \ sqrt {2} $的层次结构的价格。
Hierarchical Clustering is a popular tool for understanding the hereditary properties of a data set. Such a clustering is actually a sequence of clusterings that starts with the trivial clustering in which every data point forms its own cluster and then successively merges two existing clusters until all points are in the same cluster. A hierarchical clustering achieves an approximation factor of $α$ if the costs of each $k$-clustering in the hierarchy are at most $α$ times the costs of an optimal $k$-clustering. We study as cost functions the maximum (discrete) radius of any cluster ($k$-center problem) and the maximum diameter of any cluster ($k$-diameter problem). In general, the optimal clusterings do not form a hierarchy and hence an approximation factor of $1$ cannot be achieved. We call the smallest approximation factor that can be achieved for any instance the price of hierarchy. For the $k$-diameter problem we improve the upper bound on the price of hierarchy to $3+2\sqrt{2}\approx 5.83$. Moreover we significantly improve the lower bounds for $k$-center and $k$-diameter, proving a price of hierarchy of exactly $4$ and $3+2\sqrt{2}$, respectively.