无处不在的边缘分布深度加固学习：在离散作用空间中分析拜占庭式药物

论文标题

无处不在的边缘分布深度加固学习：在离散作用空间中分析拜占庭式药物

Ubiquitous Distributed Deep Reinforcement Learning at the Edge: Analyzing Byzantine Agents in Discrete Action Spaces

论文作者

Zhao, Wenshuai, Queralta, Jorge Peña, Qingqing, Li, Westerlund, Tomi

论文摘要

下一代移动网络中边缘计算的集成是将低延迟和高带宽无处不在的连通性带到无数的网络物理系统。这将进一步增强在各种自主系统中嵌入到边缘的日益增加的智能，在这些自主系统中，协作机器学习有可能发挥重要作用。本文讨论了在存在拜占庭或故障剂的存在下可能发生的多代理分布的深钢筋学习中的一些挑战。随着模拟对真实差距的桥接，必须考虑故障或错误的可能性。我们展示了错误的离散行动如何显着影响协作学习工作。特别是，我们分析了可能会以给定概率执行错误作用的代理的效果。我们根据每个策略更新的代理商的经验数量来研究系统通过协作学习过程的共同工作政策的能力，以及经历故障的代理商的错误行为的一部分。我们的实验是在模拟环境中使用离散作用空间的ATARI测试床进行的，并为分布式多代理训练进行了优势参与者批评（A2C）。

The integration of edge computing in next-generation mobile networks is bringing low-latency and high-bandwidth ubiquitous connectivity to a myriad of cyber-physical systems. This will further boost the increasing intelligence that is being embedded at the edge in various types of autonomous systems, where collaborative machine learning has the potential to play a significant role. This paper discusses some of the challenges in multi-agent distributed deep reinforcement learning that can occur in the presence of byzantine or malfunctioning agents. As the simulation-to-reality gap gets bridged, the probability of malfunctions or errors must be taken into account. We show how wrong discrete actions can significantly affect the collaborative learning effort. In particular, we analyze the effect of having a fraction of agents that might perform the wrong action with a given probability. We study the ability of the system to converge towards a common working policy through the collaborative learning process based on the number of experiences from each of the agents to be aggregated for each policy update, together with the fraction of wrong actions from agents experiencing malfunctions. Our experiments are carried out in a simulation environment using the Atari testbed for the discrete action spaces, and advantage actor-critic (A2C) for the distributed multi-agent training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题