预测和批评：通过增强学习加速云计算的端到端预测控制

论文标题

预测和批评：通过增强学习加速云计算的端到端预测控制

Predict-and-Critic: Accelerated End-to-End Predictive Control for Cloud Computing through Reinforcement Learning

论文作者

Sridhar, Kaustubh, Singh, Vikramank, Narayanaswamy, Balakrishnan, Sankararaman, Abishek

论文摘要

云计算有望通过规模经济降低成本。为了实现这一诺言，云计算供应商通常会解决连续资源分配问题，其中客户工作负载包装在共享硬件上。虚拟机（VM）构成了现代云计算的基础，因为它们有助于从共享物理基础结构中逻辑上抽象的用户计算。传统上，通过预测需求来解决VM填料问题，其次是对未来视野的模型预测控制（MPC）优化。我们引入了工业VM填料问题的大致公式，作为MILP，其软件构成了预测的参数。最近，提出了预测和优化（PNO），以通过优化问题将决策的成本进行后退，以端到端对预测模型的端到端培训。但是，PNO无法扩展到云计算中普遍存在的大型预测范围。为了解决这个问题，我们提出了预测和批评（PNC）框架，该框架仅通过利用加强学习而以两步范围优于PNO。 PNC共同训练一个预测模型和一个终端Q函数，该函数通过优化问题\ emph {and from the Future}来反向传播决策成本，从而近似远距离的成本范围。终端Q函数使我们能够求解比PNO中必需的多步范围要小得多的两步级优化问题。我们在两个数据集（三个工作负载）上评估了PNO和PNC框架，并且在优化问题中未建立干扰。我们发现，即使优化问题不是现实的完美代表，PNC也可以显着提高PNO的决策质量。我们还发现，硬化MILP的软限制并通过限制来重新传播的软限制可改善PNO和PNC的决策质量。

Cloud computing holds the promise of reduced costs through economies of scale. To realize this promise, cloud computing vendors typically solve sequential resource allocation problems, where customer workloads are packed on shared hardware. Virtual machines (VM) form the foundation of modern cloud computing as they help logically abstract user compute from shared physical infrastructure. Traditionally, VM packing problems are solved by predicting demand, followed by a Model Predictive Control (MPC) optimization over a future horizon. We introduce an approximate formulation of an industrial VM packing problem as an MILP with soft-constraints parameterized by the predictions. Recently, predict-and-optimize (PnO) was proposed for end-to-end training of prediction models by back-propagating the cost of decisions through the optimization problem. But, PnO is unable to scale to the large prediction horizons prevalent in cloud computing. To tackle this issue, we propose the Predict-and-Critic (PnC) framework that outperforms PnO with just a two-step horizon by leveraging reinforcement learning. PnC jointly trains a prediction model and a terminal Q function that approximates cost-to-go over a long horizon, by back-propagating the cost of decisions through the optimization problem \emph{and from the future}. The terminal Q function allows us to solve a much smaller two-step horizon optimization problem than the multi-step horizon necessary in PnO. We evaluate PnO and the PnC framework on two datasets, three workloads, and with disturbances not modeled in the optimization problem. We find that PnC significantly improves decision quality over PnO, even when the optimization problem is not a perfect representation of reality. We also find that hardening the soft constraints of the MILP and back-propagating through the constraints improves decision quality for both PnO and PnC.

下载PDF全文

下载文献需遵守相关版权规定

论文标题