通过安全加固学习的远程电倾斜优化

论文标题

通过安全加固学习的远程电倾斜优化

Remote Electrical Tilt Optimization via Safe Reinforcement Learning

论文作者

Vannella, Filippo, Iakovidis, Grigorios, Hakim, Ezeddin Al, Aumayr, Erik, Feghhi, Saman

论文摘要

远程电倾斜（RET）优化是一种有效的方法，用于调整基站（BSS）天线的垂直倾斜角，以优化网络的关键性能指标（KPI）。强化学习（RL）为RET优化提供了强大的框架，因为它具有自学习能力和对环境变化的适应性。但是，RL代理在交互过程中可以执行不安全的操作，即导致不希望的网络性能退化的操作。由于服务的可靠性对于移动网络运营商（MNOS）至关重要，因此性能退化的前景禁止现实的RL方法进行RET优化。在这项工作中，我们在安全加强学习（SRL）框架中对RET优化问题进行了建模，目的是学习倾斜控制策略，从而提供了相对于安全基线的性能提高保证。我们利用了最近的SRL方法，即通过基线引导（SPIBB）改进安全的政策，从安全基线收集的互动互动中学习改进的策略。我们的实验表明，所提出的方法能够学习安全，改进的倾斜更新政策，从而提供更高的可靠性和现实世界网络部署的潜力。

Remote Electrical Tilt (RET) optimization is an efficient method for adjusting the vertical tilt angle of Base Stations (BSs) antennas in order to optimize Key Performance Indicators (KPIs) of the network. Reinforcement Learning (RL) provides a powerful framework for RET optimization because of its self-learning capabilities and adaptivity to environmental changes. However, an RL agent may execute unsafe actions during the course of its interaction, i.e., actions resulting in undesired network performance degradation. Since the reliability of services is critical for Mobile Network Operators (MNOs), the prospect of performance degradation has prohibited the real-world deployment of RL methods for RET optimization. In this work, we model the RET optimization problem in the Safe Reinforcement Learning (SRL) framework with the goal of learning a tilt control strategy providing performance improvement guarantees with respect to a safe baseline. We leverage a recent SRL method, namely Safe Policy Improvement through Baseline Bootstrapping (SPIBB), to learn an improved policy from an offline dataset of interactions collected by the safe baseline. Our experiments show that the proposed approach is able to learn a safe and improved tilt update policy, providing a higher degree of reliability and potential for real-world network deployment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题