论文标题
部分可观测时空混沌系统的无模型预测
On the performance of a highly-scalable Computational Fluid Dynamics code on AMD, ARM and Intel processors
论文作者
论文摘要
没有比高性能计算(HPC)更饥饿的计算领域,其需求仍然是处理器性能和采用加速器的主要驱动力,并且在内存,存储和网络技术方面也取得了进步。过去十年英特尔处理器统治的一个关键特征是GPU作为协作者的广泛采用,而最近的开发项目已经看到许多CPU处理器的可用性增加,包括新型ARM基于ARM的芯片。本文分析了配备有AMD EPYC-ROME(EPYC,4096核)的三个HPC群集系统上最先进的计算流体动力学(CFD)代码的性能和可伸缩性,基于ARM的MARVELL THUNDERX2(TX2,8192 Cores)和Intel Skylake(SKL,8000 Cores)处理器。设计了三个基准案例,具有增加计算与通信比和数值复杂性,即盖子驱动的腔流量,泰勒 - 绿色涡流和使用级别设定方法的旅行孤立波,该方法采用了$ 4^{th} $ - 订购中央 - 份量或$ 5^{th} {th} $ - 订单$ - 订单$ - 订购weno weno weno weno weno weno weno weno。我们的结果表明,EPYC群集为所有正在考虑的设置提供了最佳的代码性能。在前两个基准测试中,SKL群集的计算时间比TX2系统更快,而在单独的波浪模拟中,TX2群集具有与EPYC系统的良好可扩展性和相似的性能,这既可以通过SKL群集获得的功能进行改进。这些结果表明,尽管英特尔SKL核心提供了最佳的强伸缩性,但与EPYC系统相比,相关的群集性能较低。考虑到其最近在HPC投资组合中增加的TX2群集性能。
No area of computing is hungrier for performance than High Performance Computing (HPC), the demands of which continue to be a major driver for processor performance and adoption of accelerators, and also advances in memory, storage, and networking technologies. A key feature of the Intel processor domination of the past decade has been the extensive adoption of GPUs as coprocessors, whilst more recent developments have seen the increased availability of a number of CPU processors, including the novel ARM-based chips. This paper analyses the performance and scalability of a state-of-the-art Computational Fluid Dynamics (CFD) code on three HPC cluster systems equipped with AMD EPYC-Rome (EPYC, 4096 cores), ARM-based Marvell ThunderX2 (TX2, 8192 cores) and Intel Skylake (SKL, 8000 cores) processors. Three benchmark cases are designed with increasing computation-to-communication ratio and numerical complexity, namely lid-driven cavity flow, Taylor-Green vortex and a travelling solitary wave using the level-set method, adopted with $4^{th}$-order central-differences or a $5^{th}$-order WENO scheme. Our results show that the EPYC cluster delivers the best code performance for all the setups under consideration. In the first two benchmarks, the SKL cluster demonstrates faster computing times than the TX2 system, whilst in the solitary wave simulations, the TX2 cluster achieves good scalability and similar performance to the EPYC system, both improving on that obtained with the SKL cluster. These results suggest that while the Intel SKL cores deliver the best strong scalability, the associated cluster performance is lower compared to the EPYC system. The TX2 cluster performance is promising considering its recent addition to the HPC portfolio.