通过在多核CPU上有效并行化加速Barnes-HUT T-SNE算法

论文标题

通过在多核CPU上有效并行化加速Barnes-HUT T-SNE算法

Accelerating Barnes-Hut t-SNE Algorithm by Efficient Parallelization on Multi-Core CPUs

论文作者

Chaudhary, Narendra, Pivovar, Alexander, Yakovlev, Pavel, Gorshkov, Andrey, Misra, Sanchit

论文摘要

T-SNE仍然是可视化高维数据的最受欢迎的嵌入技术之一。大多数T-SNE的标准包装（例如Scikit-learn）使用Barnes-Hut T-SNE（BH T-SNE）算法用于大型数据集。但是，该算法的现有CPU实现效率低下。在这项工作中，我们通过缓存优化，SIMD，并行化顺序步骤和改进多线程步骤的并行化加速了CPU上的BH T-SNE。我们的实施（ACC-T-SNE）比Scikit-Learn和最先进的BH T-SNE实现的速度高达261倍，并且在32核Intel（R）Icelake Cloud实例上分别是最先进的BH T-SNE实现。

t-SNE remains one of the most popular embedding techniques for visualizing high-dimensional data. Most standard packages of t-SNE, such as scikit-learn, use the Barnes-Hut t-SNE (BH t-SNE) algorithm for large datasets. However, existing CPU implementations of this algorithm are inefficient. In this work, we accelerate the BH t-SNE on CPUs via cache optimizations, SIMD, parallelizing sequential steps, and improving parallelization of multithreaded steps. Our implementation (Acc-t-SNE) is up to 261x and 4x faster than scikit-learn and the state-of-the-art BH t-SNE implementation from daal4py, respectively, on a 32-core Intel(R) Icelake cloud instance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题