论文标题

SmartExchange:交易更高成本的存储器/访问较低成本计算

SmartExchange: Trading Higher-cost Memory Storage/Access for Lower-cost Computation

论文作者

Zhao, Yang, Chen, Xiaohan, Wang, Yue, Li, Chaojian, You, Haoran, Fu, Yonggan, Xie, Yuan, Wang, Zhangyang, Lin, Yingyan

论文摘要

我们提出了SmartExchange,这是一种算法 - 硬件共同设计框架,用于将更高成本的存储器存储/访问用于低成本计算,以供深神经网络(DNNS)进行节能推断。我们开发了一种新型算法来强制执行特殊有利的DNN重量结构,其中每个层重量矩阵可以存储为小基矩阵的乘积和一个较大的稀疏系数矩阵,其非零元素都是2。据我们所知,这种算法是将三个主流模型压缩思想集成到一个统一框架中的第一个公式:稀疏或修剪,分解和量化。因此,由此产生的稀疏和易于量化的DNN大大减少了数据移动和体重存储的能源消耗。最重要的是,我们进一步设计了一个专用的加速器,以充分利用SmartExchange增强的权重以提高能效和延迟性能。广泛的实验表明,在算法级别上,SmartExchange在基于九个DNN模型和四个数据集的各种消融研究中,SmartExchange优于最先进的压缩技术,包括仅仅是稀疏或修剪,分解和量化; 2)在硬件级别上,提议的基于SmartExchange的加速器可以提高能源效率6.7 $ \ times $,并在四个最先进的DNN加速器上提高最高19.2 $ \ times $,当时在七个DNN型号上进行了基准测试(包括七个DNN型号(包括四个标准DNNS,两种Compact DNNS,两个Compact DNN模型),并)。

We present SmartExchange, an algorithm-hardware co-design framework to trade higher-cost memory storage/access for lower-cost computation, for energy-efficient inference of deep neural networks (DNNs). We develop a novel algorithm to enforce a specially favorable DNN weight structure, where each layerwise weight matrix can be stored as the product of a small basis matrix and a large sparse coefficient matrix whose non-zero elements are all power-of-2. To our best knowledge, this algorithm is the first formulation that integrates three mainstream model compression ideas: sparsification or pruning, decomposition, and quantization, into one unified framework. The resulting sparse and readily-quantized DNN thus enjoys greatly reduced energy consumption in data movement as well as weight storage. On top of that, we further design a dedicated accelerator to fully utilize the SmartExchange-enforced weights to improve both energy efficiency and latency performance. Extensive experiments show that 1) on the algorithm level, SmartExchange outperforms state-of-the-art compression techniques, including merely sparsification or pruning, decomposition, and quantization, in various ablation studies based on nine DNN models and four datasets; and 2) on the hardware level, the proposed SmartExchange based accelerator can improve the energy efficiency by up to 6.7$\times$ and the speedup by up to 19.2$\times$ over four state-of-the-art DNN accelerators, when benchmarked on seven DNN models (including four standard DNNs, two compact DNN models, and one segmentation model) and three datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源