TVM编译器堆栈的Trransprecision Tensor Accelerator的敏捷自动调节

论文标题

TVM编译器堆栈的Trransprecision Tensor Accelerator的敏捷自动调节

Agile Autotuning of a Transprecision Tensor Accelerator Overlay for TVM Compiler Stack

论文作者

Diamantopoulos, Dionysios, Ringlein, Burkhard, Purandare, Mitra, Singh, Gagandeep, Hagleitner, Christoph

论文摘要

用于张量操作的专门加速器，例如阻塞的矩阵操作和多维卷积，已成为用于高性能深度学习计算的强大体系结构选择。框架，模型和精确选项的快速发展挑战了此类张量加速器的适应性，因为适应了新要求，就会造成巨大的工程成本。可编程张量加速器提供了一种有希望的替代方法，它允许重新配置虚拟体系结构，该虚拟体系结构覆盖在物理FPGA可配置的结构之上。我们提出了一种由敏捷的自动调节技术引导的覆盖层（τ-VTA）和优化方法。我们的性能要高于最新性能和更快的融合。

Specialized accelerators for tensor-operations, such as blocked-matrix operations and multi-dimensional convolutions, have been emerged as powerful architecture choices for high-performance Deep-Learning computing. The rapid development of frameworks, models, and precision options challenges the adaptability of such tensor-accelerators since the adaptation to new requirements incurs significant engineering costs. Programmable tensor accelerators offer a promising alternative by allowing reconfiguration of a virtual architecture that overlays on top of the physical FPGA configurable fabric. We propose an overlay (τ-VTA) and an optimization method guided by agile-inspired auto-tuning techniques. We achieve higher performance and faster convergence than state-of-art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题