论文标题
分布式的多机构增强学习与单跳邻居并计算缓解straggler
Distributed Multi-Agent Reinforcement Learning with One-hop Neighbors and Compute Straggler Mitigation
论文作者
论文摘要
大多数多机构增强学习(MARL)方法在他们可以处理的问题的规模上受到限制。随着代理数量的越来越多,找到最佳行为所需的训练迭代次数由于指数增长的关节状态和动作空间而成倍增加。本文通过引入一种可扩展的MARL方法来解决此限制,称为单跳邻居(DARL1N),称为分布式多代理增强学习。 DARL1N是一种非政策 - 参与者 - 批评方法,在表示价值和策略功能时,通过将代理之间的信息交换限制为单跳邻居来解决维度的诅咒。每个代理商在单跳社区中优化其价值和政策功能,从而大大降低了学习的复杂性,但通过使用不同的邻居数量和状态进行培训来保持表现力。这种结构使我们能够制定一个分布式学习框架,以进一步加快培训程序。但是,分布式计算系统包含Straggler Compute节点,由于通信瓶颈,软件或硬件问题,它们缓慢或反应迟钝。为了减轻有害的Straggler效应,我们引入了一种新颖的编码分布式学习体系结构,该架构利用编码理论来提高学习系统对散乱者的弹性。综合实验表明,DARL1N大大减少了训练时间,而无需牺牲政策质量,并且随着代理数量的增加而可扩展。此外,编码的分布式学习体系结构提高了散乱者存在的培训效率。
Most multi-agent reinforcement learning (MARL) methods are limited in the scale of problems they can handle. With increasing numbers of agents, the number of training iterations required to find the optimal behaviors increases exponentially due to the exponentially growing joint state and action spaces. This paper tackles this limitation by introducing a scalable MARL method called Distributed multi-Agent Reinforcement Learning with One-hop Neighbors (DARL1N). DARL1N is an off-policy actor-critic method that addresses the curse of dimensionality by restricting information exchanges among the agents to one-hop neighbors when representing value and policy functions. Each agent optimizes its value and policy functions over a one-hop neighborhood, significantly reducing the learning complexity, yet maintaining expressiveness by training with varying neighbor numbers and states. This structure allows us to formulate a distributed learning framework to further speed up the training procedure. Distributed computing systems, however, contain straggler compute nodes, which are slow or unresponsive due to communication bottlenecks, software or hardware problems. To mitigate the detrimental straggler effect, we introduce a novel coded distributed learning architecture, which leverages coding theory to improve the resilience of the learning system to stragglers. Comprehensive experiments show that DARL1N significantly reduces training time without sacrificing policy quality and is scalable as the number of agents increases. Moreover, the coded distributed learning architecture improves training efficiency in the presence of stragglers.