具有基于自适应窗口的日程安排

论文标题

具有基于自适应窗口的日程安排

Out-of-core Training for Extremely Large-Scale Neural Networks With Adaptive Window-Based Scheduling

论文作者

Hayakawa, Akio, Narihira, Takuya

论文摘要

虽然大型神经网络在各种任务中表现出更高的性能，但由于GPU记忆尺寸的限制，训练大型网络很困难。我们提出了一种新型的核心外算法，该算法可以更快地训练具有比分配的GPU记忆大的大规模神经网络。在给定的内存预算约束下，我们的调度算法在本地根据每个函数的内存使用情况适应了内存传输的时间，从而改善了计算和内存传输之间的重叠。此外，我们将通常在OS中执行的虚拟寻址技术应用于具有核心执行的神经网络的训练，从而大大减少了由频繁的存储器传输引起的记忆片段量。有了我们提出的算法，我们成功地训练了1440批次大小的Resnet-50，将训练速度保持在55％，比物理内存的上限大7.5倍。它还比以前的最先进的要素大大胜过，即它比以更快的执行速度训练1.55倍的网络大1.55倍。此外，我们从实验上表明，我们的方法对于各种网络也可以扩展。

While large neural networks demonstrate higher performance in various tasks, training large networks is difficult due to limitations on GPU memory size. We propose a novel out-of-core algorithm that enables faster training of extremely large-scale neural networks with sizes larger than allotted GPU memory. Under a given memory budget constraint, our scheduling algorithm locally adapts the timing of memory transfers according to memory usage of each function, which improves overlap between computation and memory transfers. Additionally, we apply virtual addressing technique, commonly performed in OS, to training of neural networks with out-of-core execution, which drastically reduces the amount of memory fragmentation caused by frequent memory transfers. With our proposed algorithm, we successfully train ResNet-50 with 1440 batch-size with keeping training speed at 55%, which is 7.5x larger than the upper bound of physical memory. It also outperforms a previous state-of-the-art substantially, i.e. it trains a 1.55x larger network than state-of-the-art with faster execution. Moreover, we experimentally show that our approach is also scalable for various types of networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题