预测出口：预测计算和节能推断的精细颗粒早期出口

论文标题

预测出口：预测计算和节能推断的精细颗粒早期出口

Predictive Exit: Prediction of Fine-Grained Early Exits for Computation- and Energy-Efficient Inference

论文作者

Li, Xiangjie, Lou, Chenfei, Zhu, Zhengping, Chen, Yuchi, Shen, Yingtao, Ma, Yehan, Zou, An

论文摘要

通过将退出层添加到深度学习网络中，早期出口可以以准确的结果终止推理。是退出还是继续下一层的被动决策必须经过每个预位的退出层，直到退出为止。此外，很难在推理收益旁调整计算平台的配置。通过合并低成本预测引擎，我们为计算和节能深度学习应用提供了预测出口框架。预测出口可以预测网络将退出的位置（即，建立剩余层的数量以完成推理），这可以通过按时及时退出而无需运行每个预定位置的退出层来有效地降低网络计算成本。此外，根据剩余层的数量，选择了正确的计算配置（即频率和电压）以执行网络以进一步节省能源。广泛的实验结果表明，与经典的深度学习网络相比，预测性退出可实现多达96.2％的计算减少和72.9％的能量。鉴于相同的推理准确性和潜伏期，与最先进的退出策略相比，与早期退出相比，减少了12.8％的计算和37.6％的能量。

By adding exiting layers to the deep learning networks, early exit can terminate the inference earlier with accurate results. The passive decision-making of whether to exit or continue the next layer has to go through every pre-placed exiting layer until it exits. In addition, it is also hard to adjust the configurations of the computing platforms alongside the inference proceeds. By incorporating a low-cost prediction engine, we propose a Predictive Exit framework for computation- and energy-efficient deep learning applications. Predictive Exit can forecast where the network will exit (i.e., establish the number of remaining layers to finish the inference), which effectively reduces the network computation cost by exiting on time without running every pre-placed exiting layer. Moreover, according to the number of remaining layers, proper computing configurations (i.e., frequency and voltage) are selected to execute the network to further save energy. Extensive experimental results demonstrate that Predictive Exit achieves up to 96.2% computation reduction and 72.9% energy-saving compared with classic deep learning networks; and 12.8% computation reduction and 37.6% energy-saving compared with the early exit under state-of-the-art exiting strategies, given the same inference accuracy and latency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题