论文标题

利用MPI RMA优化Cray机器上MONC中的光晕交流通信

Leveraging MPI RMA to optimise halo-swapping communications in MONC on Cray machines

论文作者

Brown, Nick, Bareford, Michael, Weiland, Michèle

论文摘要

远程内存访问(RMA),也称为单面通信,提供了一种访问其他进程的内存的方法,而无需发布明确的消息传递样式通信调用。先前的研究得出的结论是,MPI RMA可以在传统的MPI点到点(P2P)提供提高的性能,但这些基于合成基准。在这项工作中,我们用MPI RMA替换了MONC大气模型中现有的非阻滞P2P通信调用。我们详细描述了我们的方法,并讨论了以正确性和性能的选择。实验说明,通过使用RMA,我们可以在每个时间步长的5 \%和10 \%之间的通信时间降低,最高32768个核心,与P2P相比,在整个运行(许多时间段)的整个运行(许多时间段)的整个(许多时间段)中。但是,RMA不是银弹,当将RMA整合到现有代码中时存在挑战:要实现良好的性能,必须进行重要的优化,并且图书馆支持并不是普遍成熟的。在本文中,我们在现实世界代码的背景下讨论了将P2P转换为RMA的经验教训,探索性能和扩展挑战,并详细介绍RMA同步方法。

Remote Memory Access (RMA), also known as single sided communications, provides a way of accessing the memory of other processes without having to issue explicit message passing style communication calls. Previous studies have concluded that MPI RMA can provide increased performance over traditional MPI Point to Point (P2P) but these are based on synthetic benchmarks. In this work, we replace the existing non-blocking P2P communication calls in the MONC atmospheric model with MPI RMA. We describe our approach in detail and discuss options taken for correctness and performance. Experiments on illustrate that by using RMA we can obtain between a 5\% and 10\% reduction in communication time at each timestep on up to 32768 cores, which over the entirety of a run (of many timesteps) results in a significant improvement in performance compared to P2P. However, RMA is not a silver bullet and there are challenges when integrating RMA into existing codes: important optimisations are necessary to achieve good performance and library support is not universally mature. In this paper we discuss, in the context of a real world code, the lessons learned converting P2P to RMA, explore performance and scaling challenges, and contrast alternative RMA synchronisation approaches in detail.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源