在细胞IoT网络中进行大规模访问优化的脱钩学习策略

论文标题

在细胞IoT网络中进行大规模访问优化的脱钩学习策略

A Decoupled Learning Strategy for Massive Access Optimization in Cellular IoT Networks

论文作者

Jiang, Nan, Deng, Yansha, Nallanathan, Arumugam, Yuan, Jinghong

论文摘要

预计基于蜂窝的网络将为大规模物联网（MIOT）系统提供连通性。但是，由于同时大量访问发生的碰撞，他们的随机访问通道（RACH）程序具有不可靠性。尽管在现有的RACH方案中已经处理了这种碰撞问题，但这些方案通常会组织IoT设备的传输和重新传输以及固定参数，因此几乎无法适应时间变化的流量模式。如果没有适应，RACH程序很容易遭受高访问延迟，高能耗甚至无法获得的损失。为了改善RACH程序，本文通过最大化长期混合多目标功能来实时优化RACH程序，该功能由访问成功设备的数量，平均能耗和平均访问延迟组成。为此，我们首先通过使用深入的增强学习（DRL）算法来优化访问成功设备数量的长期目标，包括不同的RACH方案，包括访问类别的禁止（ACB），后退（BO）和分布式排队（DQ）。比较了不同DRL算法的融合能力和效率，包括政策梯度（PG），Actor-Critic（AC），深Q-NETWORK（DQN）和深层确定性政策梯度（DDPG）。受此比较的结果的启发，开发了一种解耦学习策略，以共同和动态地调整这三个访问方案的访问控制因子。该脱钩策略首先利用复发性神经网络（RNN）模型来预测网络环境的实时流量值，然后使用多个DRL代理合作配置每个RACH方案的参数。

Cellular-based networks are expected to offer connectivity for massive Internet of Things (mIoT) systems. However, their Random Access CHannel (RACH) procedure suffers from unreliability, due to the collision from the simultaneous massive access. Despite that this collision problem has been treated in existing RACH schemes, these schemes usually organize IoT devices' transmission and re-transmission along with fixed parameters, thus can hardly adapt to time-varying traffic patterns. Without adaptation, the RACH procedure easily suffers from high access delay, high energy consumption, or even access unavailability. With the goal of improving the RACH procedure, this paper targets to optimize the RACH procedure in real-time by maximizing a long-term hybrid multi-objective function, which consists of the number of access success devices, the average energy consumption, and the average access delay. To do so, we first optimize the long-term objective in the number of access success devices by using Deep Reinforcement Learning (DRL) algorithms for different RACH schemes, including Access Class Barring (ACB), Back-Off (BO), and Distributed Queuing (DQ). The converging capability and efficiency of different DRL algorithms including Policy Gradient (PG), Actor-Critic (AC), Deep Q-Network (DQN), and Deep Deterministic Policy Gradients (DDPG) are compared. Inspired by the results from this comparison, a decoupled learning strategy is developed to jointly and dynamically adapt the access control factors of those three access schemes. This decoupled strategy first leverage a Recurrent Neural Network (RNN) model to predict the real-time traffic values of the network environment, and then uses multiple DRL agents to cooperatively configure parameters of each RACH scheme.

下载PDF全文

下载文献需遵守相关版权规定

论文标题