论文标题
蜂窝无人机与设备通信:通过多代理深入学习的轨迹设计和模式选择
Cellular UAV-to-Device Communications: Trajectory Design and Mode Selection by Multi-agent Deep Reinforcement Learning
论文作者
论文摘要
在目前用于传感服务的无人飞机系统(UASS)中,无人驾驶汽车(UAV)将其感觉数据传输到无牌频谱上的地面移动设备。但是,由于机会性通道的访问,周围终端的干扰是无法控制的。在本文中,我们考虑了无人机的蜂窝互联网,以确保服务质量(QoS),其中可以通过蜂窝网络或直接通过基本站(BS)将感觉数据通过无人机到设备(U2D)通信传输到移动设备。由于无人机的传感和传播可能会影响它们的轨迹,因此我们研究了无人机的轨迹设计问题,考虑到它们的传感和传播。这是马尔可夫决策问题(MDP),具有较大的国家行动空间,因此,我们利用多代理深度加固学习(DRL)来近似国家行动空间,然后提出一种多功能轨迹设计算法来解决此问题。仿真结果表明,我们提出的算法可以比策略梯度算法和单格算法获得更高的总实用性。
In the current unmanned aircraft systems (UASs) for sensing services, unmanned aerial vehicles (UAVs) transmit their sensory data to terrestrial mobile devices over the unlicensed spectrum. However, the interference from surrounding terminals is uncontrollable due to the opportunistic channel access. In this paper, we consider a cellular Internet of UAVs to guarantee the Quality-of-Service (QoS), where the sensory data can be transmitted to the mobile devices either by UAV-to-Device (U2D) communications over cellular networks, or directly through the base station (BS). Since UAVs' sensing and transmission may influence their trajectories, we study the trajectory design problem for UAVs in consideration of their sensing and transmission. This is a Markov decision problem (MDP) with a large state-action space, and thus, we utilize multi-agent deep reinforcement learning (DRL) to approximate the state-action space, and then propose a multi-UAV trajectory design algorithm to solve this problem. Simulation results show that our proposed algorithm can achieve a higher total utility than policy gradient algorithm and single-agent algorithm.