论文标题
FPNew:用于能量预先推出计算的开源多格式浮点单元架构
FPnew: An Open-Source Multi-Format Floating-Point Unit Architecture for Energy-Proportional Transprecision Computing
论文作者
论文摘要
摩尔定律和电壁的放缓需要转向可调的精度(又名Transprecision)计算,以减少能量足迹。因此,我们需要能够在具有较高能量的范围内进行浮动点操作的电路。我们提出了FPNew,这是一种高度可配置的开源外部浮点数(TP-FPU),能够支持广泛的标准和自定义FP格式。为了证明在通用处理器体系结构中FPNEW的灵活性和效率,我们扩展了RISC-V ISA,并在半精确的BFLOAT16和8位FP格式以及SIMD矢量和多格式操作上进行了操作。集成到32位RISC-V核心中,我们的TP-FPU可以加快混合精液应用程序的执行,而1.67x W.R.T. FP32基线,同时将端到端的精度保持在37%。我们还将FPNew集成到64位RISC-V核心中,支持标量或2、4或8向Vectors上的五种FP格式。对于此核心,我们测量了在全球构造中生产的硅22FDX技术,其宽电压范围从0.45V到1.2V。该单元在178 GFLOP/SW(在FP64上)和2.95 TFLOP/SW(在8位迷你爆发上)以及3.2 Gflop/s和25.3 Gflop/s的性能达到了领先的能量效率。
The slowdown of Moore's law and the power wall necessitates a shift towards finely tunable precision (a.k.a. transprecision) computing to reduce energy footprint. Hence, we need circuits capable of performing floating-point operations on a wide range of precisions with high energy-proportionality. We present FPnew, a highly configurable open-source transprecision floating-point unit (TP-FPU) capable of supporting a wide range of standard and custom FP formats. To demonstrate the flexibility and efficiency of FPnew in general-purpose processor architectures, we extend the RISC-V ISA with operations on half-precision, bfloat16, and an 8bit FP format, as well as SIMD vectors and multi-format operations. Integrated into a 32-bit RISC-V core, our TP-FPU can speed up execution of mixed-precision applications by 1.67x w.r.t. an FP32 baseline, while maintaining end-to-end precision and reducing system energy by 37%. We also integrate FPnew into a 64-bit RISC-V core, supporting five FP formats on scalars or 2, 4, or 8-way SIMD vectors. For this core, we measured the silicon manufactured in Globalfoundries 22FDX technology across a wide voltage range from 0.45V to 1.2V. The unit achieves leading-edge measured energy efficiencies between 178 Gflop/sW (on FP64) and 2.95 Tflop/sW (on 8-bit mini-floats), and a performance between 3.2 Gflop/s and 25.3 Gflop/s.