论文标题
HETNET中的功率控制功率控制的分布式训练和执行多代理增强学习
Distributed-Training-and-Execution Multi-Agent Reinforcement Learning for Power Control in HetNet
论文作者
论文摘要
在异质网络(HETNET)中,小细胞和宏细胞的重叠会导致严重的跨层干扰。尽管存在一些解决此问题的方法,但它们通常需要全球渠道状态信息,这在实践中很难获得,并获得了具有较高计算复杂性的亚最佳功率分配策略。为了克服这些局限性,我们建议针对HETNET的基于多代理的深入增强学习(MADRL)的功率控制方案,在此,每个访问点都会根据本地信息独立地独立进行权力控制决策。为了促进代理商之间的合作,我们为MADRL系统开发了基于罚款的Q学习(PQL)算法。通过在损失功能中引入正则化术语,每个代理都倾向于在重新审视状态时选择具有高奖励的经验性动作,因此策略更新速度会减慢。这样,其他代理商可以更轻松地学习代理的策略,从而实现更有效的协作过程。然后,我们在考虑的HETNET中实施了所提出的PQL,并将其与其他分布式训练和执行(DTE)算法进行比较。仿真结果表明,我们提出的PQL可以从动态环境中学习所需的电源控制策略,在该环境中,用户的位置会在情景上更改现有的DTE MADRL算法。
In heterogeneous networks (HetNets), the overlap of small cells and the macro cell causes severe cross-tier interference. Although there exist some approaches to address this problem, they usually require global channel state information, which is hard to obtain in practice, and get the sub-optimal power allocation policy with high computational complexity. To overcome these limitations, we propose a multi-agent deep reinforcement learning (MADRL) based power control scheme for the HetNet, where each access point makes power control decisions independently based on local information. To promote cooperation among agents, we develop a penalty-based Q learning (PQL) algorithm for MADRL systems. By introducing regularization terms in the loss function, each agent tends to choose an experienced action with high reward when revisiting a state, and thus the policy updating speed slows down. In this way, an agent's policy can be learned by other agents more easily, resulting in a more efficient collaboration process. We then implement the proposed PQL in the considered HetNet and compare it with other distributed-training-and-execution (DTE) algorithms. Simulation results show that our proposed PQL can learn the desired power control policy from a dynamic environment where the locations of users change episodically and outperform existing DTE MADRL algorithms.