MKPipe：用于优化openCL中多内核工作负载的编译器框架

论文标题

MKPipe：用于优化openCL中多内核工作负载的编译器框架

MKPipe: A Compiler Framework for Optimizing Multi-Kernel Workloads in OpenCL for FPGA

论文作者

Liu, Ji, Kafi, Abdullah-Al, Shen, Xipeng, Zhou, Huiyang

论文摘要

FPGA的OPENCL使开发人员可以使用类似于处理器的编程模型来设计FPGA。最近的工作表明，OPENCL级别的代码优化对于实现高计算效率很重要。但是，现有的作品要么主要侧重于优化单个内核，要么仅取决于设计多内核管道的通道。在本文中，我们提出了一个源代码编译器框架MKPipe，用于优化FPGA中OPENCL中的多内核工作负载。除了渠道，我们还提出新方案来启用多内核管道。我们的优化编译器采用系统的方法来探索这些优化方法的权衡。为了使内核执行之间更有效地重叠，我们还提出了一种新颖的WorkItem/WorkGroup-ID重新映射技术。此外，我们提出了用于吞吐量平衡和资源平衡的新算法，以调整多内核工作负载中各个内核的优化。我们的结果表明，在基线上，我们的编译器优化的多内核达到了3.6倍（平均1.4倍）的速度，在该基线中，内核已经单独优化。

OpenCL for FPGA enables developers to design FPGAs using a programming model similar for processors. Recent works have shown that code optimization at the OpenCL level is important to achieve high computational efficiency. However, existing works either focus primarily on optimizing single kernels or solely depend on channels to design multi-kernel pipelines. In this paper, we propose a source-to-source compiler framework, MKPipe, for optimizing multi-kernel workloads in OpenCL for FPGA. Besides channels, we propose new schemes to enable multi-kernel pipelines. Our optimizing compiler employs a systematic approach to explore the tradeoffs of these optimizations methods. To enable more efficient overlapping between kernel execution, we also propose a novel workitem/workgroup-id remapping technique. Furthermore, we propose new algorithms for throughput balancing and resource balancing to tune the optimizations upon individual kernels in the multi-kernel workloads. Our results show that our compiler-optimized multi-kernels achieve up to 3.6x (1.4x on average) speedup over the baseline, in which the kernels have already been optimized individually.

下载PDF全文

下载文献需遵守相关版权规定

论文标题