在GCN培训中加速向后聚集，并在GPU上准备执行路径

论文标题

在GCN培训中加速向后聚集，并在GPU上准备执行路径

Accelerating Backward Aggregation in GCN Training with Execution Path Preparing on GPUs

论文作者

Xu, Shaoxian, Shao, Zhiyuan, Yang, Ci, Liao, Xiaofei, Jin, Hai

论文摘要

现在，新兴的图形卷积网络（GCN）已被广泛用于许多领域，并且通过加速GCN培训来提高应用程序的效率是一项挑战。对于输入真实图的稀疏性质和爆炸量表，最先进的GCN训练系统（例如，GNNADVISOR）采用图形处理技术来加速信息交换（即聚合）。然而，这些系统将正向传播阶段的两个聚合阶段视为全活动图处理程序，这些过程不加区分地对输入图的所有顶点进行计算。在本文中，我们首先指出，在给定训练集的GCN训练问题中，其向后传播阶段的聚合阶段（本文称为向后聚集）可以转换为部分活动的图形处理程序，这些程序仅在输入图的部分角度上进行计算。通过利用这样的发现，我们提出了一种执行路径准备方法，该方法收集并融合了在GPU进行GCN训练的向后传播过程中使用的数据。实验结果表明，与Gnnadvisor相比，我们的方法提高了GCN培训在典型现实世界图上的向后聚集的性能，提高了1.48 x〜5.65x。此外，可以在培训（预处理期间）或与培训进行训练之前进行执行路径准备。当在预处理过程中使用时，我们的方法将总体GCN训练提高了1.05x〜1.37倍。当使用时，我们的方法将整体GCN培训提高了1.03x〜1.35倍。

The emerging Graph Convolutional Network (GCN) has now been widely used in many domains, and it is challenging to improve the efficiencies of applications by accelerating the GCN trainings. For the sparsity nature and exploding scales of input real-world graphs, state-of-the-art GCN training systems (e.g., GNNAdvisor) employ graph processing techniques to accelerate the message exchanging (i.e. aggregations) among the graph vertices. Nevertheless, these systems treat both the aggregation stages of forward and backward propagation phases as all-active graph processing procedures that indiscriminately conduct computation on all vertices of an input graph. In this paper, we first point out that in a GCN training problem with a given training set, the aggregation stages of its backward propagation phase (called as backward aggregations in this paper) can be converted to partially-active graph processing procedures, which conduct computation on only partial vertices of the input graph. By leveraging such a finding, we propose an execution path preparing method that collects and coalesces the data used during backward propagations of GCN training conducted on GPUs. The experimental results show that compared with GNNAdvisor, our approach improves the performance of the backward aggregation of GCN trainings on typical real-world graphs by 1.48x~5.65x. Moreover, the execution path preparing can be conducted either before the training (during preprocessing) or on-the-fly with the training. When used during preprocessing, our approach improves the overall GCN training by 1.05x~1.37x. And when used on-the-fly, our approach improves the overall GCN training by 1.03x~1.35x.

下载PDF全文

下载文献需遵守相关版权规定

论文标题