NAWQ-SR：一种混合精确的NPU发动机，可高效地设备超分辨率

论文标题

NAWQ-SR：一种混合精确的NPU发动机，可高效地设备超分辨率

NAWQ-SR: A Hybrid-Precision NPU Engine for Efficient On-Device Super-Resolution

论文作者

Venieris, Stylianos I., Almeida, Mario, Lee, Royson, Lane, Nicholas D.

论文摘要

近年来，图像和视频传递系统已开始整合深度学习超分辨率（SR）方法，利用其前所未有的视觉增强功能，同时降低了对网络条件的依赖。但是，在移动设备上部署这些解决方案仍然是一个积极的挑战，因为SR模型在工作量和内存足迹方面的要求过高。尽管最近在设备SR框架上取得了进展，但现有系统要么惩罚视觉质量，导致过度消耗或使可用资源效率低下。这项工作提出了NAWQ-SR，这是SR模型有效执行的新型框架。 NAWQ-SR通过一种新颖的混合精确量化技术和运行时神经图像编解码器，利用了现代移动NPU的多精液功能，以最大程度地减少延迟，同时满足用户指定的质量约束。此外，NAWQ-SR在运行时选择性地适应了算术精度，以使SR DNN层配备更广泛的代表力，从而提高了视觉质量，超出了NPU上的可能性。总体而言，NAWQ-SR在使用异质处理器（MOBISR），CPU（Splitsr）和NPU（XLSR）的最新设备SR系统上的平均速度为7.9倍，3倍和1.91倍。此外，NAWQ-SR平均提供3.2倍的加速度和0.39 dB的PSNR高于状态INT8 NPU的设计，但最重要的是，降低了量化对视觉质量的负面影响，从而在基于NPU的SR的可访问质量方面树立了新的最先进的ART。

In recent years, image and video delivery systems have begun integrating deep learning super-resolution (SR) approaches, leveraging their unprecedented visual enhancement capabilities while reducing reliance on networking conditions. Nevertheless, deploying these solutions on mobile devices still remains an active challenge as SR models are excessively demanding with respect to workload and memory footprint. Despite recent progress on on-device SR frameworks, existing systems either penalize visual quality, lead to excessive energy consumption or make inefficient use of the available resources. This work presents NAWQ-SR, a novel framework for the efficient on-device execution of SR models. Through a novel hybrid-precision quantization technique and a runtime neural image codec, NAWQ-SR exploits the multi-precision capabilities of modern mobile NPUs in order to minimize latency, while meeting user-specified quality constraints. Moreover, NAWQ-SR selectively adapts the arithmetic precision at run time to equip the SR DNN's layers with wider representational power, improving visual quality beyond what was previously possible on NPUs. Altogether, NAWQ-SR achieves an average speedup of 7.9x, 3x and 1.91x over the state-of-the-art on-device SR systems that use heterogeneous processors (MobiSR), CPU (SplitSR) and NPU (XLSR), respectively. Furthermore, NAWQ-SR delivers an average of 3.2x speedup and 0.39 dB higher PSNR over status-quo INT8 NPU designs, but most importantly mitigates the negative effects of quantization on visual quality, setting a new state-of-the-art in the attainable quality of NPU-based SR.

下载PDF全文

下载文献需遵守相关版权规定

论文标题