基于卷积神经网络的FPGA深度学习加速

论文标题

基于卷积神经网络的FPGA深度学习加速

FPGA deep learning acceleration based on convolutional neural network

论文作者

Jun, Xiong

论文摘要

鉴于卷积神经网络（CNN）的大量计算和较长的计算时间，本文提出了基于现场可编程逻辑门阵列（FPGA）的卷积神经网络硬件加速器。首先，通过对卷积层的远期操作原理的深入分析以及卷积层操作的并行性的探索，设计了输入通道并行性的硬件体系结构，输出通道并行性和卷积窗口深层窗口深处。然后，在上面的体系结构中，完全并行乘法树模块旨在加速卷积操作和有效的窗口缓冲模块，以实现卷积窗口的管道操作。最终的实验结果表明，本文提出的加速器的能效比达到32.73 GOPS/W，比现有溶液高34％，并且性能达到317.86 GOPS。

In view of the large amount of calculation and long calculation time of convolutional neural network (CNN), this paper proposes a convolutional neural network hardware accelerator based on field programmable logic gate array (FPGA). First, through in-depth analysis of the forward operation principle of the convolutional layer and exploration of the parallelism of the convolutional layer operation, a hardware architecture of input channel parallelism, output channel parallelism and convolution window deep pipeline is designed. Then in the above architecture, a fully parallel multiplication-addition tree module is designed to accelerate the convolution operation and an efficient window buffer module to implement the pipeline operation of the convolution window. The final experimental results show that the energy efficiency ratio of the accelerator proposed in this article reaches 32.73 GOPS/W, which is 34% higher than the existing solution, and the performance reaches 317.86 GOPS.

下载PDF全文

下载文献需遵守相关版权规定

论文标题