论文标题
通过抽象学习通用的无线MAC通信协议
Learning Generalized Wireless MAC Communication Protocols via Abstraction
论文作者
论文摘要
为了解决超出5G(B5G)和未来6G无线网络的异质要求,传统的中型访问控制(MAC)程序需要发展以使基站(BSS)和用户设备(UES)自动学习创新的MAC协议,以迎合极度多样化的服务。该主题受到了极大的关注,并且可以根据代理商的本地观察结果来学习通信政策,其中BSS和UES被铸造为代理,其中BSS和UE被施加了几种强化学习(RL)算法。但是,当前的方法通常被过多地适合他们接受过训练的环境,并且在看不见的条件下缺乏鲁棒性,无法在不同的环境中概括。为了克服这一问题,在这项工作中,我们没有在高维和冗余的观察空间中学习政策,而是利用植根于从环境中提取有用信息的观察抽象(OA)的概念。反过来,这允许学习通信协议比当前的基准更强大,具有更好的概括能力。为了从观察结果中学习抽象的信息,我们提出了基于自动编码器(AE)的体系结构,并将其插入多代理近端策略优化(MAPPO)框架中。仿真结果证实了通过跨环境概括UE的数量,传输数据包的数量和通道条件,通过跨环境概括进行抽象的有效性。
To tackle the heterogeneous requirements of beyond 5G (B5G) and future 6G wireless networks, conventional medium access control (MAC) procedures need to evolve to enable base stations (BSs) and user equipments (UEs) to automatically learn innovative MAC protocols catering to extremely diverse services. This topic has received significant attention, and several reinforcement learning (RL) algorithms, in which BSs and UEs are cast as agents, are available with the aim of learning a communication policy based on agents' local observations. However, current approaches are typically overfitted to the environment they are trained in, and lack robustness against unseen conditions, failing to generalize in different environments. To overcome this problem, in this work, instead of learning a policy in the high dimensional and redundant observation space, we leverage the concept of observation abstraction (OA) rooted in extracting useful information from the environment. This in turn allows learning communication protocols that are more robust and with much better generalization capabilities than current baselines. To learn the abstracted information from observations, we propose an architecture based on autoencoder (AE) and imbue it into a multi-agent proximal policy optimization (MAPPO) framework. Simulation results corroborate the effectiveness of leveraging abstraction when learning protocols by generalizing across environments, in terms of number of UEs, number of data packets to transmit, and channel conditions.