论文标题
BitPruning:学习比特长度,以进行积极而准确的量化
BitPruning: Learning Bitlengths for Aggressive and Accurate Quantization
论文作者
论文摘要
神经网络明显地使用低位级整数量化实现了最先进的准确性,从而在支持短长度的现有硬件设计上获得了执行时间和能量益处。但是,找到所需准确性的最小位长的问题仍然开放。我们引入了一种训练方法,以最大程度地减少任何粒度的推理位长,同时保持准确性。也就是说,我们提出了一个正规器,该规则可以惩罚整个体系结构中的大量比特长度表示,并显示如何将其修改以最大程度地减少其他可量化标准,例如操作数量或内存足迹。我们证明我们的方法在保持准确性的同时学习节俭表示。使用ImageNet,该方法分别在Alexnet,Resnet18和Mobilenet V2上产生的平均每层位长度为4.13、3.76和4.36位,保留在2.0%,0.5%和0.5%和0.5%的基本TOP-1准确性之内。
Neural networks have demonstrably achieved state-of-the art accuracy using low-bitlength integer quantization, yielding both execution time and energy benefits on existing hardware designs that support short bitlengths. However, the question of finding the minimum bitlength for a desired accuracy remains open. We introduce a training method for minimizing inference bitlength at any granularity while maintaining accuracy. Namely, we propose a regularizer that penalizes large bitlength representations throughout the architecture and show how it can be modified to minimize other quantifiable criteria, such as number of operations or memory footprint. We demonstrate that our method learns thrifty representations while maintaining accuracy. With ImageNet, the method produces an average per layer bitlength of 4.13, 3.76 and 4.36 bits on AlexNet, ResNet18 and MobileNet V2 respectively, remaining within 2.0%, 0.5% and 0.5% of the base TOP-1 accuracy.