论文标题
使用纳米ul启用反射平面
Enabling the Reflex Plane with the nanoPU
论文作者
论文摘要
许多最近的论文使用可编程开关证明了快速的网络内计算,比CPU快地运行许多数量级。开关编写软件的主要局限性是受限的编程模型和有限的状态。在本文中,我们探讨了一种新型的CPU(称为Nanopu)是否提供了有用的中间立场,具有熟悉的C/C ++编程模型,并可能在单个芯片上使用许多Terabits/秒的数据包处理,而RPC响应时间小于1 $μ$ s。为了评估NanoPu,我们原型和基准测试了三个通用网络服务:数据包分类,网络遥测报告处理以及在纳米u上的共识协议。使用AWS中的FPGA上的循环精确模拟对每种服务进行评估。我们发现数据包被分类为2 $ \ times $快,而INT报告的处理要比最先进的方法快于一个数量级。我们的生产质量木筏共识协议(在纳米诺邦)运行,以3 $ $ $ s的3倍复制的钥匙值商店(MICA)写入,其速度是最先进的两倍,仅为99 \%的尾部潜伏期为3.26 $ $ $ $。 为了了解如何组合这些服务,我们研究了{\ em网络反射平面}的设计和性能,旨在处理遥测数据,做出快速控制决策并在几微秒内更新一致的,复制的状态。
Many recent papers have demonstrated fast in-network computation using programmable switches, running many orders of magnitude faster than CPUs. The main limitation of writing software for switches is the constrained programming model and limited state. In this paper we explore whether a new type of CPU, called the nanoPU, offers a useful middle ground, with a familiar C/C++ programming model, and potentially many terabits/second of packet processing on a single chip, with an RPC response time less than 1 $μ$s. To evaluate the nanoPU, we prototype and benchmark three common network services: packet classification, network telemetry report processing, and consensus protocols on the nanoPU. Each service is evaluated using cycle-accurate simulations on FPGAs in AWS. We found that packets are classified 2$\times$ faster and INT reports are processed more than an order of magnitude quickly than state-of-the-art approaches. Our production quality Raft consensus protocol, running on the nanoPU, writes to a 3-way replicated key-value store (MICA) in 3 $μ$s, twice as fast as the state-of-the-art, with 99\% tail latency of only 3.26 $μ$s. To understand how these services can be combined, we study the design and performance of a {\em network reflex plane}, designed to process telemetry data, make fast control decisions, and update consistent, replicated state within a few microseconds.