论文标题
FPGA上高通量混合精液CNN加速器的设计
Design of High-Throughput Mixed-Precision CNN Accelerators on FPGA
论文作者
论文摘要
卷积神经网络(CNN)在各种应用域中达到高精度,但需要大量计算和产生昂贵的数据移动。在交易准确性的同时降低这些成本的一种方法是体重和/或激活单词长度的减少。因此,在层的混合精液量化中,可以在夸大设计空间时产生更有效的结果。在这项工作中,我们提出了一种深入的定量方法,以考虑给定FPGA的硬件资源有限的硬件资源,以有效地探索设计空间。我们的整体探索方法从架构到逻辑级别垂直穿越各种设计入门级别,并横向涵盖从处理元素到数据流的优化,以获得有效的混合精液CNN加速器。我们由此产生的硬件加速器实施了真正的混合精确操作,从而有效地执行了图层和频道量化的CNN。映射进料和身份转换连接的混合精液CNN导致竞争性准确性折衷权衡:245帧/s,RESNET-18的前5次前5倍和92.9%的前5位和92.9%的TOP-5精度为1.13 TOPS/s,分别为Resnet-152。因此,与相应的浮点基线相比,参数所需的内存足迹减少了4.9倍和9.4倍。
Convolutional Neural Networks (CNNs) reach high accuracies in various application domains, but require large amounts of computation and incur costly data movements. One method to decrease these costs while trading accuracy is weight and/or activation word-length reduction. Thereby, layer-wise mixed-precision quantization allows for more efficient results while inflating the design space. In this work, we present an in-depth quantitative methodology to efficiently explore the design space considering the limited hardware resources of a given FPGA. Our holistic exploration approach vertically traverses the various design entry levels from the architectural down to the logic level, and laterally covers optimization from processing elements to dataflow for an efficient mixed-precision CNN accelerator. Our resulting hardware accelerators implement truly mixed-precision operations that enable efficient execution of layer-wise and channel-wise quantized CNNs. Mapping feed-forward and identity-shortcut-connection mixed-precision CNNs result in competitive accuracy-throughout trade-offs: 245 frames/s with 87.48% Top-5 accuracy for ResNet-18 and 92.9% Top-5 accuracy with 1.13 TOps/s for ResNet-152, respectively. Thereby, the required memory footprint for parameters is reduced by 4.9x and 9.4x compared to the respective floating-point baseline.