论文标题

启用基于设备的智能手机GPU培训:经验教训

Enabling On-Device Smartphone GPU based Training: Lessons Learned

论文作者

Das, Anish, Kwon, Young D., Chauhan, Jagmohan, Mascolo, Cecilia

论文摘要

深度学习(DL)在许多移动应用程序中都表现出令人印象深刻的性能。大多数现有作品都集中在减少对资源约束的移动设备上运行深神经网络(DNN)推断的计算和资源开销。但是,到目前为止,DNN操作的另一个方面,即智能手机GPU的培训(前进和后传),到目前为止很少受到关注。为此,我们进行了初步分析,以检查使用移动GPU在智能手机上进行设备培训的可行性。我们首先采用开源移动DL框架(MNN)及其OpenCL后端在GPU上运行计算内核。接下来,我们观察到,对CPU的培训速度比GPU的训练要快得多,并确定了与此观察结果相关的两个可能的瓶颈:(i)计算和(ii)内存瓶颈。为了解决计算瓶颈,我们优化了OpenCL后端的内核,在Snapdragon 8系列处理器上比CPU(15-30 GFLOPS)相比,比CPU(15-30 GFLOPS)进行了2倍改进(40-70 Gflops)。但是,我们发现GPU的完整DNN训练仍然比CPU慢得多,这表明记忆瓶颈在GPU较低的CPU性能中起着重要作用。由于带宽较低,数据移动几乎需要训练时间的91%。最后,根据调查过程中的发现和失败,我们提出了未来方向的局限性和实际指南。

Deep Learning (DL) has shown impressive performance in many mobile applications. Most existing works have focused on reducing the computational and resource overheads of running Deep Neural Networks (DNN) inference on resource-constrained mobile devices. However, the other aspect of DNN operations, i.e. training (forward and backward passes) on smartphone GPUs, has received little attention thus far. To this end, we conduct an initial analysis to examine the feasibility of on-device training on smartphones using mobile GPUs. We first employ the open-source mobile DL framework (MNN) and its OpenCL backend for running compute kernels on GPUs. Next, we observed that training on CPUs is much faster than on GPUs and identified two possible bottlenecks related to this observation: (i) computation and (ii) memory bottlenecks. To solve the computation bottleneck, we optimize the OpenCL backend's kernels, showing 2x improvements (40-70 GFLOPs) over CPUs (15-30 GFLOPs) on the Snapdragon 8 series processors. However, we find that the full DNN training is still much slower on GPUs than on CPUs, indicating that memory bottleneck plays a significant role in the lower performance of GPU over CPU. The data movement takes almost 91% of training time due to the low bandwidth. Lastly, based on the findings and failures during our investigation, we present limitations and practical guidelines for future directions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源