论文标题
共同优化基于DNN的视觉分析的预处理和推断
Jointly Optimizing Preprocessing and Inference for DNN-based Visual Analytics
论文作者
论文摘要
尽管深度神经网络(DNNS)是查询大量数据的越来越流行的方式,但它们的大量运行时间仍然是一个活跃的研究领域。结果,研究人员提出了系统和优化,以通过允许用户对准确性和速度进行权衡来降低这些成本。在这项工作中,我们检查了现代加速器的视觉分析系统中的端到端DNN执行。通过一项新颖的测量研究,我们表明数据的预处理(例如,解码,调整大小)可能是许多现代硬件的许多视觉分析系统中的瓶颈。 为了解决大量预处理的瓶颈,我们为端到端的视觉分析系统介绍了两个优化。首先,我们介绍了通过使用本质的低分辨率视觉数据来实现准确性和吞吐量权衡的新颖方法。其次,我们开发了一个运行时引擎,以进行有效的视觉DNN推断。该运行时引擎a)有效地将预处理和DNN执行进行推理,b)以硬件和输入感知的方式将预处理操作放在CPU或GPU上,以及c)有效地管理内存和线程,以实现高吞吐量执行。我们在新型系统中实现这些优化,并在八个视觉数据集上评估SMOL。我们表明,它的优化可以实现高达5.9倍的端到端吞吐量改进,而在视觉分析中,固定的精度可以实现。
While deep neural networks (DNNs) are an increasingly popular way to query large corpora of data, their significant runtime remains an active area of research. As a result, researchers have proposed systems and optimizations to reduce these costs by allowing users to trade off accuracy and speed. In this work, we examine end-to-end DNN execution in visual analytics systems on modern accelerators. Through a novel measurement study, we show that the preprocessing of data (e.g., decoding, resizing) can be the bottleneck in many visual analytics systems on modern hardware. To address the bottleneck of preprocessing, we introduce two optimizations for end-to-end visual analytics systems. First, we introduce novel methods of achieving accuracy and throughput trade-offs by using natively present, low-resolution visual data. Second, we develop a runtime engine for efficient visual DNN inference. This runtime engine a) efficiently pipelines preprocessing and DNN execution for inference, b) places preprocessing operations on the CPU or GPU in a hardware- and input-aware manner, and c) efficiently manages memory and threading for high throughput execution. We implement these optimizations in a novel system, Smol, and evaluate Smol on eight visual datasets. We show that its optimizations can achieve up to 5.9x end-to-end throughput improvements at a fixed accuracy over recent work in visual analytics.