论文标题
通过无基质$ p $ -Multigrid的性能便携式固体力学
Performance Portable Solid Mechanics via Matrix-Free $p$-Multigrid
论文作者
论文摘要
固体力学的有限元分析是现代工程的基础工具,具有低阶有限元方法和组装的稀疏矩阵,代表了隐式分析的行业标准。我们使用绩效模型和数值实验来证明高阶方法大大降低了达到工程公差的成本,同时有效使用GPU。这些数据结构还为线性元素提供了2倍的好处。我们通过大型变形超弹性模拟多尺度结构的大变形超弹性模拟,证明了无基质$ p $民法型矩阵的可靠性,效率和可扩展性。 We investigate accuracy, cost, and execution time on multi-node CPU and GPU systems for moderate to large models (millions to billions of degrees of freedom) using AMD MI250X (OLCF Crusher), NVIDIA A100 (NERSC Perlmutter), and V100 (LLNL Lassen and OLCF Summit), resulting in order of magnitude efficiency improvements over a broad range of model properties and秤。我们讨论了雅各布人的有效无基质表示,并演示自动分化如何能够快速开发非线性材料模型,而不会影响针对GPU的辩论性和工作流程。这些方法是广泛适用的,并且可以适用于通用工作流程,此处通过开源库列出了所有特定于GPU特定方面的开源库,并且可以访问新的和旧版代码,从而允许应用程序代码在GPU上不受GPU造成GPU的性能。
Finite element analysis of solid mechanics is a foundational tool of modern engineering, with low-order finite element methods and assembled sparse matrices representing the industry standard for implicit analysis. We use performance models and numerical experiments to demonstrate that high-order methods greatly reduce the costs to reach engineering tolerances while enabling effective use of GPUs; these data structures also offer up to 2x benefit for linear elements. We demonstrate the reliability, efficiency, and scalability of matrix-free $p$-multigrid methods with algebraic multigrid coarse solvers through large deformation hyperelastic simulations of multiscale structures. We investigate accuracy, cost, and execution time on multi-node CPU and GPU systems for moderate to large models (millions to billions of degrees of freedom) using AMD MI250X (OLCF Crusher), NVIDIA A100 (NERSC Perlmutter), and V100 (LLNL Lassen and OLCF Summit), resulting in order of magnitude efficiency improvements over a broad range of model properties and scales. We discuss efficient matrix-free representation of Jacobians and demonstrate how automatic differentiation enables rapid development of nonlinear material models without impacting debuggability and workflows targeting GPUs. The methods are broadly applicable and amenable to common workflows, presented here via open source libraries that encapsulate all GPU-specific aspects and are accessible to both new and legacy code, allowing application code to be GPU-oblivious without compromising end-to-end performance on GPUs.