论文标题
具有探索代理的非合作多代理系统
Non-cooperative Multi-agent Systems with Exploring Agents
论文作者
论文摘要
多学科学习是机器学习中的一个具有挑战性的问题,它在不同领域(例如分布式控制,机器人技术和经济学)中具有应用。我们使用马尔可夫游戏开发了多代理行为的规定模型。由于在许多多代理系统中,因此没有必要选择其针对其他代理的最佳策略(例如,多人互动),因此我们专注于代理商“探索但接近最佳策略”的模型。我们使用Boltzmann-Gibbs分布对此类策略进行建模。这导致了一组耦合的钟形方程,描述了代理的行为。我们介绍了一组条件,在这些条件下,一组方程接收了一个独特的解决方案,并提出了两种算法,这些算法可在有限和无限的时间范围场景中提供解决方案。我们还研究了一个实用环境,可以使用占用措施来描述相互作用,并提出一个简化的马尔可夫游戏,其复杂性较小。此外,我们通过探索策略与多代理系统的最大因果熵原则建立了马尔可夫游戏之间的联系。最后,我们通过文献中的几个知名游戏以及基于现实世界应用设计的一些游戏来评估算法的性能。
Multi-agent learning is a challenging problem in machine learning that has applications in different domains such as distributed control, robotics, and economics. We develop a prescriptive model of multi-agent behavior using Markov games. Since in many multi-agent systems, agents do not necessary select their optimum strategies against other agents (e.g., multi-pedestrian interaction), we focus on models in which the agents play "exploration but near optimum strategies". We model such policies using the Boltzmann-Gibbs distribution. This leads to a set of coupled Bellman equations that describes the behavior of the agents. We introduce a set of conditions under which the set of equations admit a unique solution and propose two algorithms that provably provide the solution in finite and infinite time horizon scenarios. We also study a practical setting in which the interactions can be described using the occupancy measures and propose a simplified Markov game with less complexity. Furthermore, we establish the connection between the Markov games with exploration strategies and the principle of maximum causal entropy for multi-agent systems. Finally, we evaluate the performance of our algorithms via several well-known games from the literature and some games that are designed based on real world applications.