论文标题
AX-BXP:可用于精确可恢复深度神经网络加速度的近似封闭计算
Ax-BxP: Approximate Blocked Computation for Precision-Reconfigurable Deep Neural Network Acceleration
论文作者
论文摘要
精确缩放已成为一种流行技术,可以优化深神经网络(DNNS)的计算和存储要求。创建超低精度(低8位)DNN的努力表明,实现给定网络级别准确性所需的最低精度在整个网络之间,甚至在网络中的各个层都有很大的变化,也需要支持DNN硬件中可变精度的支持。以前的提案(例如位式硬件)会产生高高的开销,从而大大降低了较低精度的好处。为了有效地支持DNN加速器中的精确重新配置性,我们引入了一种近似计算方法,其中进行DNN计算的块(一个块是一组位),并在块的粒度下支持重新配置性。块计算的结果以近似方式组成,以实现有效的重新配置性。我们设计了一个DNN加速器,该加速器体现了近似阻止的计算,并提出了一种确定给定DNN的合适近似配置的方法。通过改变DNN之间的近似配置,我们在系统能量和性能方面分别提高了1.17x-1.73x和1.02x-2.04x,超过了8位固定点(FXP8)基线,分类精度的损失微不足道。此外,通过改变DNN中层和数据结构之间的近似配置,我们分别实现了1.25x-2.42x和1.07x-2.95x的系统能量和性能提高,精度损失可忽略不计。
Precision scaling has emerged as a popular technique to optimize the compute and storage requirements of Deep Neural Networks (DNNs). Efforts toward creating ultra-low-precision (sub-8-bit) DNNs suggest that the minimum precision required to achieve a given network-level accuracy varies considerably across networks, and even across layers within a network, requiring support for variable precision in DNN hardware. Previous proposals such as bit-serial hardware incur high overheads, significantly diminishing the benefits of lower precision. To efficiently support precision re-configurability in DNN accelerators, we introduce an approximate computing method wherein DNN computations are performed block-wise (a block is a group of bits) and re-configurability is supported at the granularity of blocks. Results of block-wise computations are composed in an approximate manner to enable efficient re-configurability. We design a DNN accelerator that embodies approximate blocked computation and propose a method to determine a suitable approximation configuration for a given DNN. By varying the approximation configurations across DNNs, we achieve 1.17x-1.73x and 1.02x-2.04x improvement in system energy and performance respectively, over an 8-bit fixed-point (FxP8) baseline, with negligible loss in classification accuracy. Further, by varying the approximation configurations across layers and data-structures within DNNs, we achieve 1.25x-2.42x and 1.07x-2.95x improvement in system energy and performance respectively, with negligible accuracy loss.