论文标题
使用HPX运行时在多个硬件架构上应用量子蒙特卡洛应用程序的性能分析
Performance Analysis of a Quantum Monte Carlo Application on Multiple Hardware Architectures Using the HPX Runtime
论文作者
论文摘要
本文介绍了我们如何成功使用HPX编程模型将DCA ++应用程序移植到包括Power9,X86,ARM V8和NVIDIA GPU在内的多个体系结构上。我们描述了可以从这种经验中学到的教训,以及使HPX在应用程序中提高CPU线程部分的好处,从而导致整个体系结构的总体增长21%。我们还描述了如何使用HPX-APEX提高抽象水平以了解性能问题并确定代码中的任务优化机会,以及这些与CPU/GPU利用率计数器,设备内存随时间的分配以及CPU内核级别的上下文交换。
This paper describes how we successfully used the HPX programming model to port the DCA++ application on multiple architectures that include POWER9, x86, ARM v8, and NVIDIA GPUs. We describe the lessons we can learn from this experience as well as the benefits of enabling the HPX in the application to improve the CPU threading part of the code, which led to an overall 21% improvement across architectures. We also describe how we used HPX-APEX to raise the level of abstraction to understand performance issues and to identify tasking optimization opportunities in the code, and how these relate to CPU/GPU utilization counters, device memory allocation over time, and CPU kernel-level context switches on a given architecture.