论文标题

CSB-RNN:具有压缩结构块的更快的RNN加速度框架

CSB-RNN: A Faster-than-Realtime RNN Acceleration Framework with Compressed Structured Blocks

论文作者

Shi, Runbin, Dong, Peiyan, Geng, Tong, Ding, Yuhao, Ma, Xiaolong, So, Hayden K. -H., Herbordt, Martin, Li, Ang, Wang, Yanzhi

论文摘要

经常性神经网络(RNN)在时间序列分析中已被广泛采用,在这些分析中,经常需要实时性能。但是,由于模型通常具有大量矩阵,因此RNN遭受了重大的计算工作量。已经提出了用于消除冗余(接近零)重量值的修剪方案。一方面,非结构化的修剪方法达到了高的修剪率,但引入了计算不规则(随机稀疏性),这对并行硬件是不友好的。另一方面,由于允许的修剪结构受到限制,面向硬件的结构化修剪损坏了较低的修剪率​​。本文介绍了CSB-RNN,这是一种具有新型压缩结构块(CSB)修剪技术的优化全栈RNN框架。 CSB修剪的RNN模型既有细修剪粒度,又有助于高修剪速率和常规结构,从而使硬件并行性受益。为了解决CSB修剪模型与细粒结构稀疏性并行推断的挑战,我们提出了一种具有专用编译器的新型硬件体系结构。从体系结构兼容共同设计中,硬件不仅支持各种RNN单元格类型,而且还能够解决具有挑战性的工作负载不平衡问题,因此可以显着提高硬件效率。

Recurrent neural networks (RNNs) have been widely adopted in temporal sequence analysis, where realtime performance is often in demand. However, RNNs suffer from heavy computational workload as the model often comes with large weight matrices. Pruning schemes have been proposed for RNNs to eliminate the redundant (close-to-zero) weight values. On one hand, the non-structured pruning methods achieve a high pruning rate but introducing computation irregularity (random sparsity), which is unfriendly to parallel hardware. On the other hand, hardware-oriented structured pruning suffers from low pruning rate due to restricted constraints on allowable pruning structure. This paper presents CSB-RNN, an optimized full-stack RNN framework with a novel compressed structured block (CSB) pruning technique. The CSB pruned RNN model comes with both fine pruning granularity that facilitates a high pruning rate and regular structure that benefits the hardware parallelism. To address the challenges in parallelizing the CSB pruned model inference with fine-grained structural sparsity, we propose a novel hardware architecture with a dedicated compiler. Gaining from the architecture-compilation co-design, the hardware not only supports various RNN cell types, but is also able to address the challenging workload imbalance issue and therefore significantly improves the hardware efficiency.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源