论文标题
当HLS符合FPGA HBM时:基准测试和带宽优化
When HLS Meets FPGA HBM: Benchmarking and Bandwidth Optimization
论文作者
论文摘要
随着最新的基于高带宽内存(HBM)的FPGA板的发布,开发人员现在可以利用前所未有的外部内存带宽。这允许更多由内存的应用程序受益于FPGA加速。但是,我们发现在开发具有高级合成(HLS)工具的某些应用程序时,完全利用可用带宽并不容易。这是由于访问HBM板的大量独立外部内存通道时现有HLS工具的限制。在本文中,我们衡量了三个具有微生物分析的最新代表性HBM FPGA板(Intel's Stratix 10 MX和Xilinx的ALVEO U50/U280板)的性能,并分析了HLS高架。接下来,我们提出基于HLS的优化技术,以改善PE访问多个HBM通道或多个PES访问HBM通道时的有效带宽。我们的实验表明,有效的带宽提高了2.4倍-3.8倍。我们还提供了一系列见解,以未来改进HBM FPGA HLS设计流。
With the recent release of High Bandwidth Memory (HBM) based FPGA boards, developers can now exploit unprecedented external memory bandwidth. This allows more memory-bounded applications to benefit from FPGA acceleration. However, we found that it is not easy to fully utilize the available bandwidth when developing some applications with high-level synthesis (HLS) tools. This is due to the limitation of existing HLS tools when accessing HBM board's large number of independent external memory channels. In this paper, we measure the performance of three recent representative HBM FPGA boards (Intel's Stratix 10 MX and Xilinx's Alveo U50/U280 boards) with microbenchmarks and analyze the HLS overhead. Next, we propose HLS-based optimization techniques to improve the effective bandwidth when a PE accesses multiple HBM channels or multiple PEs access an HBM channel. Our experiment demonstrates that the effective bandwidth improves by 2.4X-3.8X. We also provide a list of insights for future improvement of the HBM FPGA HLS design flow.