通过TD误差聚集合作参与者评论

论文标题

通过TD误差聚集合作参与者评论

Cooperative Actor-Critic via TD Error Aggregation

论文作者

Figura, Martin, Lin, Yixuan, Liu, Ji, Gupta, Vijay

论文摘要

在分散的合作多机构增强学习中，代理可以彼此汇总信息，以学习最大化团队平均目标功能的政策。尽管愿意与他人合作，但各个代理商可能会发现有关其当地状态，奖励和价值功能的直接共享由于隐私问题而不受欢迎。在这项工作中，我们引入了一种带有TD错误汇总的分散的参与者批判算法，该算法不会违反隐私问题，并假设沟通渠道会受到时间延迟和数据包的删除。通过通过传输数据的维度来衡量的每个代理商的沟通负担增加，我们支付的成本增加了沟通负担。有趣的是，沟通负担仅在图形大小上是二次的，这使得适用于大型网络中的算法。我们在缩小的步长下提供收敛分析，以验证代理最大化团队平均目标函数。

In decentralized cooperative multi-agent reinforcement learning, agents can aggregate information from one another to learn policies that maximize a team-average objective function. Despite the willingness to cooperate with others, the individual agents may find direct sharing of information about their local state, reward, and value function undesirable due to privacy issues. In this work, we introduce a decentralized actor-critic algorithm with TD error aggregation that does not violate privacy issues and assumes that communication channels are subject to time delays and packet dropouts. The cost we pay for making such weak assumptions is an increased communication burden for every agent as measured by the dimension of the transmitted data. Interestingly, the communication burden is only quadratic in the graph size, which renders the algorithm applicable in large networks. We provide a convergence analysis under diminishing step size to verify that the agents maximize the team-average objective function.

下载PDF全文

下载文献需遵守相关版权规定

论文标题