有效的多用户延迟限制的调度，并进行深度重复的增强学习

论文标题

有效的多用户延迟限制的调度，并进行深度重复的增强学习

Effective Multi-User Delay-Constrained Scheduling with Deep Recurrent Reinforcement Learning

论文作者

Hu, Pihe, Pan, Ling, Chen, Yu, Fang, Zhixuan, Huang, Longbo

论文摘要

多用户延迟约束调度在许多现实世界应用中都很重要，包括无线通信，实时流和云计算。然而，这构成了一个关键的挑战，因为调度程序需要做出实时决策，以确保没有系统动态的事先信息，这可能是时间变化且难以估算的。此外，许多实际情况都遭受了部分可观察性问题，例如，由于感知噪声或隐藏的相关性。为了应对这些挑战，我们提出了一种深入的强化学习（DRL）算法，称为经常性SoftMax延迟了深层双重确定性策略梯度（$ \ Mathtt {rsd4} $），这是一种基于数据驱动的方法，基于部分观察到的Markov决策过程（POMDP）表述。 $ \ mathtt {rsd4} $分别通过拉格朗日双重和延迟敏感的队列保证资源和延迟约束。它还可以通过经常性神经网络（RNN）启用的内存机制有效地解决部分可观察性，并引入用户级分解和节点级别的合并以确保可扩展性。对模拟/现实世界数据集进行的广泛实验表明，$ \ Mathtt {RSD4} $对系统动力学和部分可观察到的环境是可靠的，并且比现有的DRL和非基于DRL的方法实现了卓越的性能。

Multi-user delay constrained scheduling is important in many real-world applications including wireless communication, live streaming, and cloud computing. Yet, it poses a critical challenge since the scheduler needs to make real-time decisions to guarantee the delay and resource constraints simultaneously without prior information of system dynamics, which can be time-varying and hard to estimate. Moreover, many practical scenarios suffer from partial observability issues, e.g., due to sensing noise or hidden correlation. To tackle these challenges, we propose a deep reinforcement learning (DRL) algorithm, named Recurrent Softmax Delayed Deep Double Deterministic Policy Gradient ($\mathtt{RSD4}$), which is a data-driven method based on a Partially Observed Markov Decision Process (POMDP) formulation. $\mathtt{RSD4}$ guarantees resource and delay constraints by Lagrangian dual and delay-sensitive queues, respectively. It also efficiently tackles partial observability with a memory mechanism enabled by the recurrent neural network (RNN) and introduces user-level decomposition and node-level merging to ensure scalability. Extensive experiments on simulated/real-world datasets demonstrate that $\mathtt{RSD4}$ is robust to system dynamics and partially observable environments, and achieves superior performances over existing DRL and non-DRL-based methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题