AX-BXP：可用于精确可恢复深度神经网络加速度的近似封闭计算

论文标题

AX-BXP：可用于精确可恢复深度神经网络加速度的近似封闭计算

Ax-BxP: Approximate Blocked Computation for Precision-Reconfigurable Deep Neural Network Acceleration

论文作者

Elangovan, Reena, Jain, Shubham, Raghunathan, Anand

论文摘要

精确缩放已成为一种流行技术，可以优化深神经网络（DNNS）的计算和存储要求。创建超低精度（低8位）DNN的努力表明，实现给定网络级别准确性所需的最低精度在整个网络之间，甚至在网络中的各个层都有很大的变化，也需要支持DNN硬件中可变精度的支持。以前的提案（例如位式硬件）会产生高高的开销，从而大大降低了较低精度的好处。为了有效地支持DNN加速器中的精确重新配置性，我们引入了一种近似计算方法，其中进行DNN计算的块（一个块是一组位），并在块的粒度下支持重新配置性。块计算的结果以近似方式组成，以实现有效的重新配置性。我们设计了一个DNN加速器，该加速器体现了近似阻止的计算，并提出了一种确定给定DNN的合适近似配置的方法。通过改变DNN之间的近似配置，我们在系统能量和性能方面分别提高了1.17x-1.73x和1.02x-2.04x，超过了8位固定点（FXP8）基线，分类精度的损失微不足道。此外，通过改变DNN中层和数据结构之间的近似配置，我们分别实现了1.25x-2.42x和1.07x-2.95x的系统能量和性能提高，精度损失可忽略不计。

Precision scaling has emerged as a popular technique to optimize the compute and storage requirements of Deep Neural Networks (DNNs). Efforts toward creating ultra-low-precision (sub-8-bit) DNNs suggest that the minimum precision required to achieve a given network-level accuracy varies considerably across networks, and even across layers within a network, requiring support for variable precision in DNN hardware. Previous proposals such as bit-serial hardware incur high overheads, significantly diminishing the benefits of lower precision. To efficiently support precision re-configurability in DNN accelerators, we introduce an approximate computing method wherein DNN computations are performed block-wise (a block is a group of bits) and re-configurability is supported at the granularity of blocks. Results of block-wise computations are composed in an approximate manner to enable efficient re-configurability. We design a DNN accelerator that embodies approximate blocked computation and propose a method to determine a suitable approximation configuration for a given DNN. By varying the approximation configurations across DNNs, we achieve 1.17x-1.73x and 1.02x-2.04x improvement in system energy and performance respectively, over an 8-bit fixed-point (FxP8) baseline, with negligible loss in classification accuracy. Further, by varying the approximation configurations across layers and data-structures within DNNs, we achieve 1.25x-2.42x and 1.07x-2.95x improvement in system energy and performance respectively, with negligible accuracy loss.

下载PDF全文

下载文献需遵守相关版权规定

论文标题