变色龙：加快深度神经网络汇编的自适应代码优化

论文标题

变色龙：加快深度神经网络汇编的自适应代码优化

Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation

论文作者

Ahn, Byung Hoon, Pilligundla, Prannoy, Yazdanbakhsh, Amir, Esmaeilzadeh, Hadi

论文摘要

通过较短的汇编时间实现更快的执行，可以促进神经网络中的进一步多样性和创新。但是，执行神经网络的当前范式依赖于手工精制的库，传统的启发式方法或最近的遗传算法和其他随机方法。这些方法经常经常进行昂贵的硬件测量，使它们不仅耗时，而且次优。因此，我们设计了一个解决方案，该解决方案可以学会快速适应以前看不见的设计空间以进行代码优化，既可以加速搜索又改善输出性能。该解决方案称为变色龙的解决方案利用了加强学习的解决方案，其解决方案采取了更少的步骤来收敛，并开发了一种自适应采样算法，不仅关注代表点上昂贵的样本（真实硬件测量），而且还使用域知识的逻辑来改善样品本身。使用真实硬件的实验表明，Chameleon在AUTOTVM上的优化时间内提供了4.45倍的速度，同时还将现代深网的推理时间提高了5.6％。

Achieving faster execution with shorter compilation time can foster further diversity and innovation in neural networks. However, the current paradigm of executing neural networks either relies on hand-optimized libraries, traditional compilation heuristics, or very recently genetic algorithms and other stochastic methods. These methods suffer from frequent costly hardware measurements rendering them not only too time consuming but also suboptimal. As such, we devise a solution that can learn to quickly adapt to a previously unseen design space for code optimization, both accelerating the search and improving the output performance. This solution dubbed Chameleon leverages reinforcement learning whose solution takes fewer steps to converge, and develops an adaptive sampling algorithm that not only focuses on the costly samples (real hardware measurements) on representative points but also uses a domain-knowledge inspired logic to improve the samples itself. Experimentation with real hardware shows that Chameleon provides 4.45x speed up in optimization time over AutoTVM, while also improving inference time of the modern deep networks by 5.6%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题