TensorFlow作为DSL，用于在小脑晶圆刻度发动机上基于模板的计算

论文标题

TensorFlow作为DSL，用于在小脑晶圆刻度发动机上基于模板的计算

TensorFlow as a DSL for stencil-based computation on the Cerebras Wafer Scale Engine

论文作者

Brown, Nick, Echols, Brandon, Zarins, Justs, Grosser, Tobias

论文摘要

小脑晶圆刻度发动机（WSE）是一种加速器，将成千上万的AI核结合在一个芯片上。尽管该技术是为机器学习工作负载而设计的，但可用的原始计算大量意味着它也是加速传统HPC计算代码的非常有趣的潜在目标。这些算法中的许多是基于模具的，更新操作涉及相邻元素的贡献，在本文中，我们从该技术的早期采用者的角度探讨了该技术对此类代码的适用性，与CPU和GPU相比。我们以TensorFlow为接口，我们探索了性能，并证明，尽管在将编程接口曝光时仍需要完成，但WSE的性能令人印象深刻，因为它在我们的实验中，它的执行四次v100 GPU乘以两次半v100 gpus乘以两次，两次，两次Intel Xeon Platinum CPU。因此，该技术在加速HPC代码的未来Exascale超级计算机中发挥重要作用。

The Cerebras Wafer Scale Engine (WSE) is an accelerator that combines hundreds of thousands of AI-cores onto a single chip. Whilst this technology has been designed for machine learning workloads, the significant amount of available raw compute means that it is also a very interesting potential target for accelerating traditional HPC computational codes. Many of these algorithms are stencil-based, where update operations involve contributions from neighbouring elements, and in this paper we explore the suitability of this technology for such codes from the perspective of an early adopter of the technology, compared to CPUs and GPUs. Using TensorFlow as the interface, we explore the performance and demonstrate that, whilst there is still work to be done around exposing the programming interface to users, performance of the WSE is impressive as it out performs four V100 GPUs by two and a half times and two Intel Xeon Platinum CPUs by around 114 times in our experiments. There is significant potential therefore for this technology to play an important role in accelerating HPC codes on future exascale supercomputers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题