论文标题
可移动的野兽:分区数据和计算存储的计算
A Moveable Beast: Partitioning Data and Compute for Computational Storage
论文作者
论文摘要
多年来,硬件趋势引入了各种异质计算单元,同时还将网络和存储带宽在内存子系统的数量级内带入。作为回应,开发人员使用了越来越有异国情调的解决方案从硬件中提取更多性能。通常,依靠其程序的静态设计时间分区,这些分区无法与正在加深的存储设备层次结构进行分层计算单元的存储系统保持同步。 我们认为,对计算的动态,即时分区为新兴的数据密集型系统提供了一种解决方案,以克服不断增长的数据大小,面对停滞的CPU性能和内存带宽。在本文中,我们描述了我们的原型计算存储系统(CSS),Skyther,该系统采用了数据库透视图来利用计算存储驱动器(CSD)。我们还提出了MSG Express,这是一种用于Skyther顶部的单细胞基因表达数据的数据管理系统。我们讨论指导CSS设计的四个设计原则:支持科学应用;最大化存储,网络和内存带宽的利用;最小化数据移动;并在自主CSD上启用灵活的程序执行。 Skyther是为CSD向存储系统引入的额外间接层而设计的,它使用可分解的查询来采用一种新的计算存储方法,以进行想象但尚未探索。 在本文中,我们评估:分区策略,功能执行的开销以及选择和投影的性能。与消费者级客户CPU相比,我们预计CSD的性能降低了约3-4倍,但我们观察到意外的放缓〜15倍,但是,我们的评估结果有助于我们在设计空间中设置锚点,以开发可分解的查询成本模型,以用于在许多CSD中为可分解的查询和分区数据。
Over the years, hardware trends have introduced various heterogeneous compute units while also bringing network and storage bandwidths within an order of magnitude of memory subsystems. In response, developers have used increasingly exotic solutions to extract more performance from hardware; typically relying on static, design-time partitioning of their programs which cannot keep pace with storage systems that are layering compute units throughout deepening hierarchies of storage devices. We argue that dynamic, just-in-time partitioning of computation offers a solution for emerging data-intensive systems to overcome ever-growing data sizes in the face of stalled CPU performance and memory bandwidth. In this paper, we describe our prototype computational storage system (CSS), Skytether, that adopts a database perspective to utilize computational storage drives (CSDs). We also present MSG Express, a data management system for single-cell gene expression data that sits on top of Skytether. We discuss four design principles that guide the design of our CSS: support scientific applications; maximize utilization of storage, network, and memory bandwidth; minimize data movement; and enable flexible program execution on autonomous CSDs. Skytether is designed for the extra layer of indirection that CSDs introduce to a storage system, using decomposable queries to take a new approach to computational storage that has been imagined but not yet explored. In this paper, we evaluate: partition strategies, the overhead of function execution, and the performance of selection and projection. We expected ~3-4x performance slowdown on the CSDs compared to a consumer-grade client CPU but we observe an unexpected slowdown of ~15x, however, our evaluation results help us set anchor points in the design space for developing a cost model for decomposable queries and partitioning data across many CSDs.