论文标题
有效的加速器,用于分解的扩张和转移卷积
Efficient Accelerator for Dilated and Transposed Convolution with Decomposition
论文作者
论文摘要
扩张和转置卷积的硬件加速度可以实时执行相关任务(例如分段),但是当前的设计是针对这些卷积类型的特定的,或者对可重构设计的复杂控制遭受复杂的控制。本文提出了一种分解输入或重量的设计,分别用于扩大和转移的卷积以跳过冗余计算,从而在现有的密集CNN硬件上有效执行。所提出的体系结构可以减少87.8%的周期计数,以实现8.2倍的速度,以在ENET案例的幼稚执行中实现87.8%的速度。
Hardware acceleration for dilated and transposed convolution enables real time execution of related tasks like segmentation, but current designs are specific for these convolutional types or suffer from complex control for reconfigurable designs. This paper presents a design that decomposes input or weight for dilated and transposed convolutions respectively to skip redundant computations and thus executes efficiently on existing dense CNN hardware as well. The proposed architecture can cut down 87.8\% of the cycle counts to achieve 8.2X speedup over a naive execution for the ENet case.