LETI：移动GPU上的神经网络推断的潜伏估计工具和调查

论文标题

LETI：移动GPU上的神经网络推断的潜伏估计工具和调查

LETI: Latency Estimation Tool and Investigation of Neural Networks inference on Mobile GPU

论文作者

Ponomarev, Evgeny, Matveev, Sergey, Oseledets, Ivan

论文摘要

希望在移动设备上运行许多深度学习应用程序。准确性和推理时间对许多人都是有意义的。虽然拖鞋的数量通常被用作神经网络潜伏期的代理，但它可能不是最佳选择。为了获得更好的延迟近似值，研究社区使用所有可能的层的查找表来计算延迟计算，以最终预测移动CPU的推断。它仅需要少量的实验。不幸的是，在移动GPU上，此方法不适用直接的方式，并且表现出较低的精度。在这项工作中，我们将移动GPU上的延迟近似视为数据和特定于硬件的问题。我们的主要目标是为神经网络推断的调查（LETI）构建一个方便的延迟估计工具，并为每个特定任务构建强大而准确的潜伏预测模型。为了实现这一目标，我们构建了开源工具，该工具提供了一种方便的方式，可以在针对移动GPU的不同目标设备上进行大规模的实验。在评估数据集后，我们将学习实验数据的回归模型，并将其用于将来的延迟预测和分析。我们通过实验证明了这种方法在流行的NAS基准101数据集的一部分中的适用性，并评估了两个移动GPU的最流行的神经网络体系结构。结果，我们在目标评估子集上构建具有良好精度的潜伏预测模型。我们将LETI视为神经架构搜索或大规模延迟评估的有用工具。该项目可从https://github.com/leti-ai获得

A lot of deep learning applications are desired to be run on mobile devices. Both accuracy and inference time are meaningful for a lot of them. While the number of FLOPs is usually used as a proxy for neural network latency, it may be not the best choice. In order to obtain a better approximation of latency, research community uses look-up tables of all possible layers for latency calculation for the final prediction of the inference on mobile CPU. It requires only a small number of experiments. Unfortunately, on mobile GPU this method is not applicable in a straight-forward way and shows low precision. In this work, we consider latency approximation on mobile GPU as a data and hardware-specific problem. Our main goal is to construct a convenient latency estimation tool for investigation(LETI) of neural network inference and building robust and accurate latency prediction models for each specific task. To achieve this goal, we build open-source tools which provide a convenient way to conduct massive experiments on different target devices focusing on mobile GPU. After evaluation of the dataset, we learn the regression model on experimental data and use it for future latency prediction and analysis. We experimentally demonstrate the applicability of such an approach on a subset of popular NAS-Benchmark 101 dataset and also evaluate the most popular neural network architectures for two mobile GPUs. As a result, we construct latency prediction model with good precision on the target evaluation subset. We consider LETI as a useful tool for neural architecture search or massive latency evaluation. The project is available at https://github.com/leti-ai

下载PDF全文

下载文献需遵守相关版权规定

论文标题