SOFTSNN：在软误差下尖峰神经网络加速器的低成本容错性

论文标题

SOFTSNN：在软误差下尖峰神经网络加速器的低成本容错性

SoftSNN: Low-Cost Fault Tolerance for Spiking Neural Network Accelerators under Soft Errors

论文作者

Putra, Rachmad Vidya Wicaksana, Hanif, Muhammad Abdullah, Shafique, Muhammad

论文摘要

专门的硬件加速器已设计和使用，以最大程度地提高尖峰神经网络（SNNS）的性能效率。但是，这种加速器容易受到瞬态断层（即软误差）的影响，这些故障是由于高能粒子罢工而发生的，并且在硬件层处于钻头翻转。这些错误可以改变SNN加速器计算引擎中的重量值和神经元操作，从而导致不正确的输出和准确性降解。但是，对于SNN，尚未对计算引擎和各自的缓解技术的软错误和各自的缓解技术的影响。潜在的解决方案是采用冗余执行（重新执行）来确保正确的输出，但它会导致巨大的延迟和能量开销。在此方面，我们提出了Softsnn，这是一种新的方法，可以减轻体重寄存器（突触）和SNN加速器的神经元的软误差而无需重新执行，从而保持了低潜伏期和能量架设的精度。我们的SoftSNN方法采用以下关键步骤：（1）在软误差下分析SNN特征以识别出故障的权重和神经元操作，这是识别错误的SNN行为所必需的；（2）一种界限和保护技术，该技术利用该分析来通过界定重量值并保护神经元免受故障操作来提高SNN容错性；（3）为神经硬件加速器设计轻巧的硬件增强功能，以有效支持所提出的技术。实验结果表明，对于甚至高断层速率的900个神经元网络，我们的SoftSNN将准确性降解保持在3％以下，同时将潜伏期和能量分别降低至3倍和2.3倍，与重新执行技术相比。

Specialized hardware accelerators have been designed and employed to maximize the performance efficiency of Spiking Neural Networks (SNNs). However, such accelerators are vulnerable to transient faults (i.e., soft errors), which occur due to high-energy particle strikes, and manifest as bit flips at the hardware layer. These errors can change the weight values and neuron operations in the compute engine of SNN accelerators, thereby leading to incorrect outputs and accuracy degradation. However, the impact of soft errors in the compute engine and the respective mitigation techniques have not been thoroughly studied yet for SNNs. A potential solution is employing redundant executions (re-execution) for ensuring correct outputs, but it leads to huge latency and energy overheads. Toward this, we propose SoftSNN, a novel methodology to mitigate soft errors in the weight registers (synapses) and neurons of SNN accelerators without re-execution, thereby maintaining the accuracy with low latency and energy overheads. Our SoftSNN methodology employs the following key steps: (1) analyzing the SNN characteristics under soft errors to identify faulty weights and neuron operations, which are required for recognizing faulty SNN behavior; (2) a Bound-and-Protect technique that leverages this analysis to improve the SNN fault tolerance by bounding the weight values and protecting the neurons from faulty operations; and (3) devising lightweight hardware enhancements for the neural hardware accelerator to efficiently support the proposed technique. The experimental results show that, for a 900-neuron network with even a high fault rate, our SoftSNN maintains the accuracy degradation below 3%, while reducing latency and energy by up to 3x and 2.3x respectively, as compared to the re-execution technique.

下载PDF全文

下载文献需遵守相关版权规定

论文标题