论文标题
高性能计算变异性管理的设计策略和近似方法
Design Strategies and Approximation Methods for High-Performance Computing Variability Management
论文作者
论文摘要
性能变异性管理是高性能计算(HPC)的活跃研究领域。我们专注于输入/输出(I/O)变异性。为了研究性能变异性,计算机科学家经常使用基于网格的设计(GBD)来收集I/O变异性数据,并使用数学近似方法来构建预测模型。数学近似模型可能会有偏见,特别是在需要外推时。空间填充设计(SFD)和替代模型(例如高斯流程(GP))在数据收集和构建预测模型中很受欢迎。 SFD和替代物在HPC变异性需要调查中的适用性。我们根据设计效率,预测准确性和可扩展性研究了它们在HPC设置中的适用性。我们首先自定义现有的SFD,以便可以在HPC设置中应用它们。我们对设计策略和近似方法的预测能力进行了全面研究。我们使用从三个测试功能模拟的合成数据和HPC设置中的实际数据。然后,我们从设计效率,预测准确性和可扩展性方面比较不同的方法。在合成和实际数据分析中,在大多数情况下,具有SFD的GP优于表现。关于近似模型,建议使用SFD收集数据,建议使用GP。如果使用GBD收集数据,则可以考虑GP和Delaunay。通过近似方法的最佳选择,SFD和GBD的性能取决于基础表面的特性。对于SFD表现更好的情况,SFD所需的设计点数量约为或小于GBD的一半或小于GBD,以实现相同的预测准确性。建议将可以定制为高维和非平滑表面的SFD,尤其是在模型中需要考虑大量输入因子时。
Performance variability management is an active research area in high-performance computing (HPC). We focus on input/output (I/O) variability. To study the performance variability, computer scientists often use grid-based designs (GBDs) to collect I/O variability data, and use mathematical approximation methods to build a prediction model. Mathematical approximation models could be biased particularly if extrapolations are needed. Space-filling designs (SFDs) and surrogate models such as Gaussian process (GP) are popular for data collection and building predictive models. The applicability of SFDs and surrogates in the HPC variability needs investigation. We investigate their applicability in the HPC setting in terms of design efficiency, prediction accuracy, and scalability. We first customize the existing SFDs so that they can be applied in the HPC setting. We conduct a comprehensive investigation of design strategies and the prediction ability of approximation methods. We use both synthetic data simulated from three test functions and the real data from the HPC setting. We then compare different methods in terms of design efficiency, prediction accuracy, and scalability. In synthetic and real data analysis, GP with SFDs outperforms in most scenarios. With respect to approximation models, GP is recommended if the data are collected by SFDs. If data are collected using GBDs, both GP and Delaunay can be considered. With the best choice of approximation method, the performance of SFDs and GBD depends on the property of the underlying surface. For the cases in which SFDs perform better, the number of design points needed for SFDs is about half of or less than that of the GBD to achieve the same prediction accuracy. SFDs that can be tailored to high dimension and non-smooth surface are recommended especially when large numbers of input factors need to be considered in the model.