论文标题

使用非挥发记忆技术有效的深度学习

Efficient Deep Learning Using Non-Volatile Memory Technology

论文作者

Inci, Ahmet, Isgenc, Mehmet Meric, Marculescu, Diana

论文摘要

嵌入式机器学习(ML)系统现在已成为部署ML服务任务的主要平台,预计对于培训ML模型而言将变得同样重要。随之而来的是在严格的内存约束下,总体高效部署,尤其是低功率和高吞吐量实现的挑战。在这种情况下,与常规SRAM相比,由于其非挥发性,较高的细胞密度和可伸缩性特征,诸如STT-MRAM和SOT-MRAM之类的非挥发记忆(NVM)技术具有显着的优势。尽管先前的工作已经调查了NVM对通用应用的几种架构含义,但在这项工作中,我们提出了DEEPNVM ++,这是一个综合框架,以对GPU架构(DL)应用中基于NVM的基于NVM的缓存(DL)应用程序(DL)应用,通过结合技术特异性电路级模型以及各种DL工作量的实际记忆行为。 DEEPNVM ++依赖于使用常规SRAM和新兴的STT-MRAM和SOT-MRAM Technologies实施的最后级别的卡希斯的ISO容量和ISO面积性能和能量模型。在ISO容量的情况下,与常规的SRAM相比,STT-MRAM和SOT-MRAM可提供高达3.8倍和4.7倍的能量 - 延迟产品(EDP)的降低以及2.4倍和2.8倍的面积。在ISO-AREA假设下,STT-MRAM和SOT-MRAM可提供高达2.2倍和2.4倍的EDP降低,并且与SRAM相比,分别可容纳2.3倍和3.3倍的缓存能力。我们还执行可伸缩性分析,并表明与大型缓存能力相比,STT-MRAM和SOT-MRAM与SRAM相比实现了EDP的降低。 DEEPNVM ++在STT-/SOT-MRAM技术上进行了证明,可用于DL应用中GPU中最后一级缓存的任何NVM技术的表征,建模和分析。

Embedded machine learning (ML) systems have now become the dominant platform for deploying ML serving tasks and are projected to become of equal importance for training ML models. With this comes the challenge of overall efficient deployment, in particular low power and high throughput implementations, under stringent memory constraints. In this context, non-volatile memory (NVM) technologies such as STT-MRAM and SOT-MRAM have significant advantages compared to conventional SRAM due to their non-volatility, higher cell density, and scalability features. While prior work has investigated several architectural implications of NVM for generic applications, in this work we present DeepNVM++, a comprehensive framework to characterize, model, and analyze NVM-based caches in GPU architectures for deep learning (DL) applications by combining technology-specific circuit-level models and the actual memory behavior of various DL workloads. DeepNVM++ relies on iso-capacity and iso-area performance and energy models for last-level caches implemented using conventional SRAM and emerging STT-MRAM and SOT-MRAM technologies. In the iso-capacity case, STT-MRAM and SOT-MRAM provide up to 3.8x and 4.7x energy-delay product (EDP) reduction and 2.4x and 2.8x area reduction compared to conventional SRAM, respectively. Under iso-area assumptions, STT-MRAM and SOT-MRAM provide up to 2.2x and 2.4x EDP reduction and accommodate 2.3x and 3.3x cache capacity when compared to SRAM, respectively. We also perform a scalability analysis and show that STT-MRAM and SOT-MRAM achieve orders of magnitude EDP reduction when compared to SRAM for large cache capacities. DeepNVM++ is demonstrated on STT-/SOT-MRAM technologies and can be used for the characterization, modeling, and analysis of any NVM technology for last-level caches in GPUs for DL applications.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源