随机团队中有效学习的情节logit-Q动力学

论文标题

随机团队中有效学习的情节logit-Q动力学

Episodic Logit-Q Dynamics for Efficient Learning in Stochastic Teams

论文作者

Unlu, Onur, Sayin, Muhammed O.

论文摘要

我们提出了新的学习动态（独立的）对数线性学习和价值迭代的随机游戏框架内的随机游戏。事实证明，在相同利益的随机游戏中实现了有效的平衡（也称为最佳平衡），这超出了最近在可证明的（可能效率低下）平衡方面的融合方面的集中度。动力学也是独立的，因为代理在合理的程度上采取了与他们的局部观点一致的行动，而不是寻求平衡。这些方面在智能和自主系统的控制应用中可能具有实际兴趣。关键的挑战是，由于其他代理的适应性，从单个代理的观点融合了效率低下的平衡和环境的非平稳性。日志线性更新在解决前者方面起着重要作用。我们通过播放剧本方案解决了后者，在该方案中，代理只在情节结束时更新其Q功能估计。

We present new learning dynamics combining (independent) log-linear learning and value iteration for stochastic games within the auxiliary stage game framework. The dynamics presented provably attain the efficient equilibrium (also known as optimal equilibrium) in identical-interest stochastic games, beyond the recent concentration of progress on provable convergence to some (possibly inefficient) equilibrium. The dynamics are also independent in the sense that agents take actions consistent with their local viewpoint to a reasonable extent rather than seeking equilibrium. These aspects can be of practical interest in the control applications of intelligent and autonomous systems. The key challenges are the convergence to an inefficient equilibrium and the non-stationarity of the environment from a single agent's viewpoint due to the adaptation of others. The log-linear update plays an important role in addressing the former. We address the latter through the play-in-episodes scheme in which the agents update their Q-function estimates only at the end of the episodes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题