论文标题
在数据中心性能中,唯一的常数是更改
In Datacenter Performance, The Only Constant Is Change
论文作者
论文摘要
所有计算基础架构都遭受性能变异性,无论是裸机还是虚拟化。这种现象源于许多来源:一些短暂的邻居,例如嘈杂的邻居,而其他一些则更永久但突然,例如硬件中的变化或磨损,基本的机管控制堆栈的变化,甚至是计算资源提供者政策之间的无证件相互作用。因此,在云,HPC设施以及更一般而言的数据中心环境上获得的性能测量几乎可以表现出随着时间的流逝而发展的性能制度,从而导致应用程序性能中的不良非平稳性。在本文中,我们介绍了CloudLab测试床上可用的裸金属硬件的性能分析,我们专注于使用变更点检测来量化不断发展的性能制度。我们描述了我们的发现,并得到了一个数据集的支持,该数据集在2年零9个月的时间内从1600多个机器中收集了近690万个基准结果。这些发现对一个计算设施中现实世界的性能变异性模式进行了全面的特征,这是一种研究其他基础设施上此类模式的方法,并有助于更好地理解性能可变性。
All computing infrastructure suffers from performance variability, be it bare-metal or virtualized. This phenomenon originates from many sources: some transient, such as noisy neighbors, and others more permanent but sudden, such as changes or wear in hardware, changes in the underlying hypervisor stack, or even undocumented interactions between the policies of the computing resource provider and the active workloads. Thus, performance measurements obtained on clouds, HPC facilities, and, more generally, datacenter environments are almost guaranteed to exhibit performance regimes that evolve over time, which leads to undesirable nonstationarities in application performance. In this paper, we present our analysis of performance of the bare-metal hardware available on the CloudLab testbed where we focus on quantifying the evolving performance regimes using changepoint detection. We describe our findings, backed by a dataset with nearly 6.9M benchmark results collected from over 1600 machines over a period of 2 years and 9 months. These findings yield a comprehensive characterization of real-world performance variability patterns in one computing facility, a methodology for studying such patterns on other infrastructures, and contribute to a better understanding of performance variability in general.