论文标题
MSREP:多GPU系统的快速而轻的稀疏矩阵框架
MSREP: A Fast yet Light Sparse Matrix Framework for Multi-GPU Systems
论文作者
论文摘要
稀疏的线性代数内核在众多应用中起着至关重要的作用,从Exascale科学模拟到大规模数据分析。在这些应用程序中,卸载线性代数内核将不再可行,仅仅是因为快速增长的数据量可能超过单个GPU的存储容量和计算能力。如今,多GPU系统在超级计算机和数据中心中无处不在,在扩大稀疏线性代数内核方面具有巨大的潜力。在这项工作中,我们针对称为MSREP的多GPU系统设计了一种新型的稀疏矩阵表示框架,以根据我们的增强稀疏矩阵格式以平衡的方式扩展稀疏线性代数操作。与密集的操作不同,稀疏性显着加强了以平衡方式在多个GPU之间分配计算工作量的困难。我们增强了三种主流稀疏数据格式-CSR,CSC和COO,以实现细粒的数据分布。我们以稀疏矩阵矢量乘法(SPMV)为例,以说明我们的MSREP框架的效率。此外,可以轻松地扩展MSREP,以基于三种基本格式(即CSR,CSC和COO)支持其他稀疏线性代数内核。
Sparse linear algebra kernels play a critical role in numerous applications, covering from exascale scientific simulation to large-scale data analytics. Offloading linear algebra kernels on one GPU will no longer be viable in these applications, simply because the rapidly growing data volume may exceed the memory capacity and computing power of a single GPU. Multi-GPU systems nowadays being ubiquitous in supercomputers and data-centers present great potentials in scaling up large sparse linear algebra kernels. In this work, we design a novel sparse matrix representation framework for multi-GPU systems called MSREP, to scale sparse linear algebra operations based on our augmented sparse matrix formats in a balanced pattern. Different from dense operations, sparsity significantly intensifies the difficulty of distributing the computation workload among multiple GPUs in a balanced manner. We enhance three mainstream sparse data formats -- CSR, CSC, and COO, to enable fine-grained data distribution. We take sparse matrix-vector multiplication (SpMV) as an example to demonstrate the efficiency of our MSREP framework. In addition, MSREP can be easily extended to support other sparse linear algebra kernels based on the three fundamental formats (i.e., CSR, CSC and COO).