任务和面向数据的通信之间的共存：矮的索引指导的多代理增强学习方法

论文标题

任务和面向数据的通信之间的共存：矮的索引指导的多代理增强学习方法

Coexistence between Task- and Data-Oriented Communications: A Whittle's Index Guided Multi-Agent Reinforcement Learning Approach

论文作者

Li, Ran, Huang, Chuan, Qin, Xiaoqi, Jiang, Shengpei, Ma, Nan, Cui, Shuguang

论文摘要

我们研究了共享一组渠道的物联网系统中面向任务和面向数据的通信的共存，并研究计划问题，以共同优化了不正确信息的加权年龄（AOII）和吞吐量，这是两种通信的性能指标。该问题被称为马尔可夫决策问题，由于较大的离散动作空间以及随机可用性引起的随机可用性，该问题很难解决。通过利用此问题的固有属性并根据渠道统计重新重新设计奖励功能，我们首先简化了解决方案空间，状态空间和最佳标准，并将其转换为同等的马尔可夫游戏，为此，大型离散的动作空间问题得到了极大的解释。 Then, we propose a Whittle's index guided multi-agent proximal policy optimization (WI-MAPPO) algorithm to solve the considered game, where the embedded Whittle's index module further shrinks the action space, and the proposed offline training algorithm extends the training kernel of conventional MAPPO to address the issue of time-varying constraints.最后，数值结果验证了所提出的算法在情况下，基于通道资源不足的情况下，基于信息的算法（AOI）算法显着优于最新的信息时代（AOI）算法。

We investigate the coexistence of task-oriented and data-oriented communications in a IoT system that shares a group of channels, and study the scheduling problem to jointly optimize the weighted age of incorrect information (AoII) and throughput, which are the performance metrics of the two types of communications, respectively. This problem is formulated as a Markov decision problem, which is difficult to solve due to the large discrete action space and the time-varying action constraints induced by the stochastic availability of channels. By exploiting the intrinsic properties of this problem and reformulating the reward function based on channel statistics, we first simplify the solution space, state space, and optimality criteria, and convert it to an equivalent Markov game, for which the large discrete action space issue is greatly relieved. Then, we propose a Whittle's index guided multi-agent proximal policy optimization (WI-MAPPO) algorithm to solve the considered game, where the embedded Whittle's index module further shrinks the action space, and the proposed offline training algorithm extends the training kernel of conventional MAPPO to address the issue of time-varying constraints. Finally, numerical results validate that the proposed algorithm significantly outperforms state-of-the-art age of information (AoI) based algorithms under scenarios with insufficient channel resources.

下载PDF全文

下载文献需遵守相关版权规定

论文标题