L3：高分辨率，高通量DNN培训的加速器友好型无损图像格式

论文标题

L3：高分辨率，高通量DNN培训的加速器友好型无损图像格式

L3: Accelerator-Friendly Lossless Image Format for High-Resolution, High-Throughput DNN Training

论文作者

Bae, Jonghyun, Baek, Woohyeon, Ham, Tae Jun, Lee, Jae W.

论文摘要

深度神经网络（DNN）的训练过程通常是用阶段进行管道的，用于在CPU上进行数据制备，然后对GPU等加速器进行梯度计算。在理想的管道中，端到端训练吞吐量最终受到加速器的吞吐量的限制，而不是数据准备。过去，DNN训练管道通过使用以JPEG（例如JPEG）的轻巧，有损的图像格式编码的数据集实现了近乎最佳的吞吐量。但是，随着高分辨率，无损编码的数据集变得越来越流行，对于需要高精度的应用程序，由于CPU上的低通量图像解码，在数据准备阶段出现了性能问题。因此，我们提出了L3，这是一种用于高分辨率，高通量DNN训练的自定义轻巧，无损的图像格式。 L3的解码过程在加速器上有效平行，从而最大程度地减少了在DNN训练期间进行数据制备的CPU干预措施。 L3的数据制备吞吐量比PNG（最流行的无损图像格式）高9.29倍，用于NVIDIA A100 GPU上的CityScapes数据集，该数据集可导致1.71倍端到端训练吞吐量。与JPEG和WebP相比，两种流行的有损图像格式，L3分别以同等的度量性能为ImageNet提供高达1.77倍和2.87倍的端到端训练吞吐量。

The training process of deep neural networks (DNNs) is usually pipelined with stages for data preparation on CPUs followed by gradient computation on accelerators like GPUs. In an ideal pipeline, the end-to-end training throughput is eventually limited by the throughput of the accelerator, not by that of data preparation. In the past, the DNN training pipeline achieved a near-optimal throughput by utilizing datasets encoded with a lightweight, lossy image format like JPEG. However, as high-resolution, losslessly-encoded datasets become more popular for applications requiring high accuracy, a performance problem arises in the data preparation stage due to low-throughput image decoding on the CPU. Thus, we propose L3, a custom lightweight, lossless image format for high-resolution, high-throughput DNN training. The decoding process of L3 is effectively parallelized on the accelerator, thus minimizing CPU intervention for data preparation during DNN training. L3 achieves a 9.29x higher data preparation throughput than PNG, the most popular lossless image format, for the Cityscapes dataset on NVIDIA A100 GPU, which leads to 1.71x higher end-to-end training throughput. Compared to JPEG and WebP, two popular lossy image formats, L3 provides up to 1.77x and 2.87x higher end-to-end training throughput for ImageNet, respectively, at equivalent metric performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题