STT-MRAM CACHES的分析和经验可靠性探索的系统级框架

论文标题

STT-MRAM CACHES的分析和经验可靠性探索的系统级框架

A System-Level Framework for Analytical and Empirical Reliability Exploration of STT-MRAM Caches

论文作者

Cheshmikhani, Elham, Farbeh, Hamed, Asadi, Hossein

论文摘要

自旋转移扭矩磁性RAM（STT-MRAM）被称为大型最后一级缓存（LLC）中SRAM技术最有希望的替代品。尽管具有高密度，非挥发性，接近零的泄漏功率和对辐射的免疫力，但作为主要优势，基于STT-MRAM的高速公路的较高错误率主要是由于保留失败，读取干扰和写入失败。现有的研究仅限于估计STT-MRAM缓存的其中一种或两种误差类型的速率。但是，在以前的任何研究中尚未提供STT-MRAM缓存的总体脆弱性，其估计是设计具有成本效益的可靠缓存的必不可少的。在本文中，我们提出了一个系统级框架，以探索STT-MRAM缓存中错误行为的可靠性探索和表征。为此，考虑到错误类型的相互关系，包括所有三个错误以及错误率对工作负载行为和过程变化（PVS）的依赖性，我们将制定缓存漏洞。我们的分析表明，STT-MRAM缓存漏洞高度取决于工作负载，并且因不同的高速缓存访问模式的数量级而变化。我们的分析研究还表明，由于Stt-MRAM细胞的过程变化，这种脆弱性差异显着增加。为了评估框架，我们在GEM5全系统模拟器中实现了错误类型，实验结果表明，共享LLC中的总错误率在不同的工作负载方面差32.0倍。考虑STT-MRAM细胞中的PV时，观察到6.5倍的脆弱性变化。另外，每种错误类型在总LLC漏洞中的贡献在不同的缓存访问模式和此外，错误率会受到PV的不同影响。

Spin-Transfer Torque Magnetic RAM (STT-MRAM) is known as the most promising replacement for SRAM technology in large Last-Level Caches (LLCs). Despite its high-density, non-volatility, near-zero leakage power, and immunity to radiation as the major advantages, STT-MRAM-based cache suffers from high error rates mainly due to retention failure, read disturbance, and write failure. Existing studies are limited to estimating the rate of only one or two of these error types for STT-MRAM cache. However, the overall vulnerability of STT-MRAM caches, which its estimation is a must to design cost-efficient reliable caches, has not been offered in any of previous studies. In this paper, we propose a system-level framework for reliability exploration and characterization of errors behavior in STT-MRAM caches. To this end, we formulate the cache vulnerability considering the inter-correlation of the error types including all three errors as well as the dependency of error rates to workloads behavior and Process Variations (PVs). Our analysis reveals that STT-MRAM cache vulnerability is highly workload-dependent and varies by orders of magnitude in different cache access patterns. Our analytical study also shows that this vulnerability divergence significantly increases by process variations in STT-MRAM cells. To evaluate the framework, we implement the error types in the gem5 full-system simulator, and the experimental results show that the total error rate in a shared LLC varies by 32.0x for different workloads. A further 6.5x vulnerability variation is observed when considering PVs in the STT-MRAM cells. In addition, the contribution of each error type in total LLC vulnerability highly varies in different cache access patterns and moreover, error rates are differently affected by PVs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题