Lyapunov引导的深入强化学习，用于稳定的在线计算网络中的稳定在线计算卸载

论文标题

Lyapunov引导的深入强化学习，用于稳定的在线计算网络中的稳定在线计算卸载

Lyapunov-guided Deep Reinforcement Learning for Stable Online Computation Offloading in Mobile-Edge Computing Networks

论文作者

Bi, Suzhi, Huang, Liang, Wang, Hui, Zhang, Ying-Jun Angela

论文摘要

机会性计算卸载是在动态边缘环境下改善移动边缘计算（MEC）网络计算性能（MEC）网络的有效方法。在本文中，我们考虑了一个多用户MEC网络，该网络具有随时间变化的无线通道和随机用户任务数据到达，以顺序时间框架到达。特别是，我们旨在设计一个在线计算卸载算法，以最大程度地提高网络数据处理能力，但要遵守长期数据队列稳定性和平均功率约束。在线算法在于每个时间范围的决策的意义上是实用的，而没有假设知道未来的渠道条件和数据到达。我们将问题提出为多阶段的随机混合整数非线性编程（MINLP）问题，该问题共同确定二进制卸载（每个用户在本地或边缘服务器上计算任务）和在顺序时间帧中的系统资源分配决策。为了解决不同时间范围的决策中的耦合，我们提出了一个名为Lydroo的新颖框架，该框架结合了Lyapunov优化和深度强化学习（DRL）的优势。具体而言，Lydroo首先应用Lyapunov优化，将多阶段随机MINLP解散到确定性的人均MINLP子问题中。通过这样做，它可以确保通过解决尺寸小得多的每个框架子问题来满足所有长期约束。然后，Lydroo集成了基于模型的优化和无模型的DRL，以解决低计算复杂性的人均MINLP问题。仿真结果表明，在各种网络设置下，所提出的Lydroo实现了最佳计算性能，同时稳定系统中的所有队列。此外，它诱导了非常低的执行延迟，特别适合在快速褪色环境中实时实现。

Opportunistic computation offloading is an effective method to improve the computation performance of mobile-edge computing (MEC) networks under dynamic edge environment. In this paper, we consider a multi-user MEC network with time-varying wireless channels and stochastic user task data arrivals in sequential time frames. In particular, we aim to design an online computation offloading algorithm to maximize the network data processing capability subject to the long-term data queue stability and average power constraints. The online algorithm is practical in the sense that the decisions for each time frame are made without the assumption of knowing future channel conditions and data arrivals. We formulate the problem as a multi-stage stochastic mixed integer non-linear programming (MINLP) problem that jointly determines the binary offloading (each user computes the task either locally or at the edge server) and system resource allocation decisions in sequential time frames. To address the coupling in the decisions of different time frames, we propose a novel framework, named LyDROO, that combines the advantages of Lyapunov optimization and deep reinforcement learning (DRL). Specifically, LyDROO first applies Lyapunov optimization to decouple the multi-stage stochastic MINLP into deterministic per-frame MINLP subproblems. By doing so, it guarantees to satisfy all the long-term constraints by solving the per-frame subproblems that are much smaller in size. Then, LyDROO integrates model-based optimization and model-free DRL to solve the per-frame MINLP problems with low computational complexity. Simulation results show that under various network setups, the proposed LyDROO achieves optimal computation performance while stabilizing all queues in the system. Besides, it induces very low execution latency that is particularly suitable for real-time implementation in fast fading environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题