论文标题
MEDHA:用于计算加密数据的微编码硬件加速器
Medha: Microcoded Hardware Accelerator for computing on Encrypted Data
论文作者
论文摘要
同态加密(HE)可以对加密数据进行计算,因此,它在保护云中的计算外包方面具有很大的潜力。他的硬件加速度至关重要,因为软件实现非常慢。在本文中,我们提出了用于构建可编程硬件加速器的设计方法,用于加快对加密数据的云侧同构评估。首先,我们提出了一种划分和争议技术,该技术可以在大型多项式环$ r_ {q,2n} $中实现同构评估,以使用已为较小环$ r_ {q,n} $构建的硬件加速器。该技术使使用单个硬件加速器灵活地支持几个参数集成为可能。接下来,我们提出几种用于实现灵活和指令的加速器体系结构的建筑设计方法,我们称为“ Medha”。在实施层次结构的每个层面上,我们探讨了并行处理的可能性。从针对基本构建块的硬件友好型并行算法开始,我们逐渐构建了重大的并行RNS多项式算术单元。接下来,这些平行单元中的许多是优雅的互连,因此它们的互连需要最少的网,因此使整个建筑位置友好地友好。对于MEDHA,我们采用一种保存的设计方法,并在同态评估过程中摆脱任何芯片内存储器的访问。最后,我们在Xilinx aLVEO U250 FPGA中实现MEDHA,并测量微编码同构添加,乘法,键转换和重新缩放的定时性能,以200 MHz时钟频率以200 MHz的频率进行RNS-HEAAN。对于两个大型参数集,与高度优化的软件实现Microsoft Seal的2.3 GHz相比,MEDHA分别达到高达68倍和78倍的加速度。
Homomorphic encryption (HE) enables computation on encrypted data, and hence it has a great potential in privacy-preserving outsourcing of computations to the cloud. Hardware acceleration of HE is crucial as software implementations are very slow. In this paper, we present design methodologies for building a programmable hardware accelerator for speeding up the cloud-side homomorphic evaluations on encrypted data. First, we propose a divide-and-conquer technique that enables homomorphic evaluations in a large polynomial ring $R_{Q,2N}$ to use a hardware accelerator that has been built for the smaller ring $R_{Q,N}$. The technique makes it possible to use a single hardware accelerator flexibly for supporting several HE parameter sets. Next, we present several architectural design methods that we use to realize the flexible and instruction-set accelerator architecture, which we call `Medha'. At every level of the implementation hierarchy, we explore possibilities for parallel processing. Starting from hardware-friendly parallel algorithms for the basic building blocks, we gradually build heavily parallel RNS polynomial arithmetic units. Next, many of these parallel units are interconnected elegantly so that their interconnections require the minimum number of nets, therefore making the overall architecture placement-friendly on the platform. For Medha, we take a memory-conservative design approach and get rid of any off-chip memory access during homomorphic evaluations. Finally, we implement Medha in a Xilinx Alveo U250 FPGA and measure timing performances of the microcoded homomorphic addition, multiplication, key-switching, and rescaling for the leveled HE scheme RNS-HEAAN at 200 MHz clock frequency. For two large parameter sets, Medha achieves accelerations by up to 68x and 78x times respectively compared to a highly optimized software implementation Microsoft SEAL running at 2.3 GHz.