论文标题
分层最近的邻居图嵌入以降低维度
Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction
论文作者
论文摘要
降低降低对于可视化和预处理高维数据至关重要。我们介绍了一种基于原始空间中最近最邻居图的层次结构的新颖方法,该方法用于在多个级别上保存数据分布的分组属性。该提案的核心是无优化的投影,它与最新版本的T-SNE和UMAP在性能和可视化质量方面具有竞争力,同时在运行时更快。此外,它的可解释力学,投射新数据的能力以及可视化中数据簇的自然分离使其成为一种通用的无监督维度缩小技术。在本文中,我们讨论了所提出的方法的合理性,并将其评估在各种数据集中,尺寸从1K到11M的样本不等,尺寸从28到16K不等。我们在多个指标上与其他最先进的方法进行比较,并强调其效率和性能。代码可从https://github.com/koulakis/h-nne获得
Dimensionality reduction is crucial both for visualization and preprocessing high dimensional data for machine learning. We introduce a novel method based on a hierarchy built on 1-nearest neighbor graphs in the original space which is used to preserve the grouping properties of the data distribution on multiple levels. The core of the proposal is an optimization-free projection that is competitive with the latest versions of t-SNE and UMAP in performance and visualization quality while being an order of magnitude faster in run-time. Furthermore, its interpretable mechanics, the ability to project new data, and the natural separation of data clusters in visualizations make it a general purpose unsupervised dimension reduction technique. In the paper, we argue about the soundness of the proposed method and evaluate it on a diverse collection of datasets with sizes varying from 1K to 11M samples and dimensions from 28 to 16K. We perform comparisons with other state-of-the-art methods on multiple metrics and target dimensions highlighting its efficiency and performance. Code is available at https://github.com/koulakis/h-nne