编译器感知的神经体系结构搜索实时摩托车的超级分辨率

论文标题

编译器感知的神经体系结构搜索实时摩托车的超级分辨率

Compiler-Aware Neural Architecture Search for On-Mobile Real-time Super-Resolution

论文作者

Wu, Yushu, Gong, Yifan, Zhao, Pu, Li, Yanyu, Zhan, Zheng, Niu, Wei, Tang, Hao, Qin, Minghai, Ren, Bin, Wang, Yanzhi

论文摘要

近年来，基于深度学习的超级分辨率（SR）由于其高图像质量性能和广泛的应用方案而获得了极大的知名度。但是，先前的方法通常会遭受大量计算和巨大的功耗，这会导致实时推理的困难，尤其是在资源有限的平台（例如移动设备）上。为了减轻这种情况，我们建议使用自适应SR块进行深度搜索和每层宽度搜索，以进行深度搜索和每层宽度搜索。推理速度与SR损失一起直接将其带入具有高图像质量的SR模型，同时满足实时推理需求。利用了与编译器优化的速度模型在每次迭代中的移动设备上测量速度，以预测具有各种宽度配置的SR块的推理潜伏期，以更快地收敛。通过提出的框架，我们在移动平台的GPU/DSP上实现了实时SR推断，以实施具有竞争力SR性能的720p分辨率（三星Galaxy S21）。

Deep learning-based super-resolution (SR) has gained tremendous popularity in recent years because of its high image quality performance and wide application scenarios. However, prior methods typically suffer from large amounts of computations and huge power consumption, causing difficulties for real-time inference, especially on resource-limited platforms such as mobile devices. To mitigate this, we propose a compiler-aware SR neural architecture search (NAS) framework that conducts depth search and per-layer width search with adaptive SR blocks. The inference speed is directly taken into the optimization along with the SR loss to derive SR models with high image quality while satisfying the real-time inference requirement. Instead of measuring the speed on mobile devices at each iteration during the search process, a speed model incorporated with compiler optimizations is leveraged to predict the inference latency of the SR block with various width configurations for faster convergence. With the proposed framework, we achieve real-time SR inference for implementing 720p resolution with competitive SR performance (in terms of PSNR and SSIM) on GPU/DSP of mobile platforms (Samsung Galaxy S21).

下载PDF全文

下载文献需遵守相关版权规定

论文标题