内核-AS-A-Service：GPU的无服务器接口

论文标题

内核-AS-A-Service：GPU的无服务器接口

Kernel-as-a-Service: A Serverless Interface to GPUs

论文作者

Pemberton, Nathan, Zabreyko, Anton, Ding, Zhoujie, Katz, Randy, Gonzalez, Joseph

论文摘要

无服务器计算使您比以往任何时候都更容易通过可扩展的云资源部署应用程序，同时驱动云提供商的更高利用率。尽管该技术在CPU和本地DRAM（例如CPU和本地DRAM）等易于分区的资源方面运行良好，但它一直在努力融合更昂贵，更单一的资源，例如GPU或其他应用程序加速器。我们不能简单地将GPU拍在FAAS平台上，并希望保持所有无服务的诺言。如果我们想最好地利用这些关键资源，我们需要一种更量身定制的方法。在本文中，我们介绍了GPU的无服务器接口内核-AS-AS-Service（KAAS）。在KAAS中，GPU是一流的公民，就像其他任何无服务器功能一样被调用。 KAAS在传统功能上运行主机代码时，KAAS并没有像通常完成的GPU代码一样混合主机和GPU代码。 KAAS系统负责管理GPU内存，并计划在可用GPU的整个池中安排用户内核，而不是依靠静态分配。这种方法使我们能够更有效地共享昂贵的GPU资源，尤其是在云等多端环境中。我们为射线分布式计算框架增加了对KAAS的支持，并通过包括基于TVM的深度学习编译器和BLAS库在内的工作负载进行评估。我们的结果表明，当争夺GPU资源时，KAAS能够驱动高达50倍的吞吐量和16倍的延迟。

Serverless computing has made it easier than ever to deploy applications over scalable cloud resources, all the while driving higher utilization for cloud providers. While this technique has worked well for easily divisible resources like CPU and local DRAM, it has struggled to incorporate more expensive and monolithic resources like GPUs or other application accelerators. We cannot simply slap a GPU on a FaaS platform and expect to keep all the benefits serverless promises. We need a more tailored approach if we want to best utilize these critical resources. In this paper we present Kernel-as-a-Service (KaaS), a serverless interface to GPUs. In KaaS, GPUs are first-class citizens that are invoked just like any other serverless function. Rather than mixing host and GPU code as is typically done, KaaS runs graphs of GPU-only code while host code is run on traditional functions. The KaaS system is responsible for managing GPU memory and schedules user kernels across the entire pool of available GPUs rather than relying on static allocations. This approach allows us to more effectively share expensive GPU resources, especially in multitenant environments like the cloud. We add support for KaaS to the Ray distributed computing framework and evaluate it with workloads including a TVM-based deep learning compiler and a BLAS library. Our results show that KaaS is able to drive up to 50x higher throughput and 16x lower latency when GPU resources are contended.

下载PDF全文

下载文献需遵守相关版权规定

论文标题