论文标题
基于模型的多代理强化学习与合作的优先级别
Model-based Multi-Agent Reinforcement Learning with Cooperative Prioritized Sweeping
论文作者
论文摘要
我们提出了一种新的基于模型的增强学习算法,合作的优先级,以在多代理马尔可夫决策过程中有效学习。该算法允许通过利用分解以近似值函数来对大问题进行样本效率学习。我们的方法只需要以动态决策网络的形式了解问题的结构。使用此信息,我们的方法学习了环境模型,并执行时间差异更新,这些更新会立即影响多个联合状态和行动。批处理更新是另外执行的,这些更新有效地在整个Q-功能中有效地回到了知识。我们的方法在众所周知的Sysadmin基准和随机环境上都优于最先进的算法稀疏合作Q学习算法。
We present a new model-based reinforcement learning algorithm, Cooperative Prioritized Sweeping, for efficient learning in multi-agent Markov decision processes. The algorithm allows for sample-efficient learning on large problems by exploiting a factorization to approximate the value function. Our approach only requires knowledge about the structure of the problem in the form of a dynamic decision network. Using this information, our method learns a model of the environment and performs temporal difference updates which affect multiple joint states and actions at once. Batch updates are additionally performed which efficiently back-propagate knowledge throughout the factored Q-function. Our method outperforms the state-of-the-art algorithm sparse cooperative Q-learning algorithm, both on the well-known SysAdmin benchmark and randomized environments.