GPU加速了多元网眼上的不连续的Galerkin方法

论文标题

GPU加速了多元网眼上的不连续的Galerkin方法

GPU-accelerated discontinuous Galerkin methods on polytopic meshes

论文作者

Dong, Zhaonan, Georgoulis, Emmanuil H., Kappas, Thomas

论文摘要

近年来，由多边形/多面体（此后，统称为\ emph {polytopic}）元素组成的网格上的不连续galerkin（DG）方法已受到了近年来的广泛关注。由于通常使用的物理框架基函数和涉及的正交挑战，因此这些方法的矩阵组装步骤通常在计算上很麻烦。为了解决这个重要的实际问题，这项工作提出了两种平行组装实现算法，载有CUDA的图形卡，用于内部罚款DG方法，用于各种类似的线性PDE问题的多物质网格。我们关注单个GPU并行化以及分布式GPU节点的实现。结果包括相对于所使用的GPU核数的数量，显示了几乎线性的可伸缩性，因为组装步骤不需要通信。反过来，这可以证明可以极有效地实施多重型DG方法的说法是合理的，因为与拟议算法相比，与“标准”简单或盒子型网格上有限元素相比，任何组装计算时间间接费用都可以有效地规避。

Discontinuous Galerkin (dG) methods on meshes consisting of polygonal/polyhedral (henceforth, collectively termed as \emph{polytopic}) elements have received considerable attention in recent years. Due to the physical frame basis functions used typically and the quadrature challenges involved, the matrix-assembly step for these methods is often computationally cumbersome. To address this important practical issue, this work proposes two parallel assembly implementation algorithms on CUDA-enabled graphics cards for the interior penalty dG method on polytopic meshes for various classes of linear PDE problems. We are concerned with both single GPU parallelization, as well as with implementation on distributed GPU nodes. The results included showcase almost linear scalability of the quadrature step with respect to the number of GPU-cores used since no communication is needed for the assembly step. In turn, this can justify the claim that polytopic dG methods can be implemented extremely efficiently, as any assembly computing time overhead compared to finite elements on `standard' simplicial or box-type meshes can be effectively circumvented by the proposed algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题