论文标题
与$ O(\ log t)$交换遗憾在多人游戏中的未耦合学习动态
Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer Games
论文作者
论文摘要
在本文中,我们建立了有效的和\ emph {ucph {ucph}学习动力学,以便在所有玩家在一般的多人游戏中雇用所有玩家,在游戏的$ t $重复范围后,每个玩家的\ emph {swap遗憾}在$ o(\ log t)$中绑定到$ o(\ f t)$,改善了$ o(\ o(\ o(flog T)),$ o(\ o(\ o(\ o(^4)同时,我们保证最佳$ o(\ sqrt {t})$交换也遗憾。为了获得这些结果,我们的主要贡献是表明,当所有玩家都以\ emph {time-Insinistiant}的学习率遵循我们的动态时,动态的\ emph {二阶路径长度}到时间$ t $都受$ o(\ log log t)限制的动力学,这是一种基本含义,可能会在近乎范围内的(swap)范围内的基本含义(swap),这是一个基本的含义。我们提出的学习动力学以一种新颖的方式结合了\ emph {乐观}的正规化学习,并使用\ emph {自我控制障碍}。此外,我们的分析非常简单,绕过了Daskalakis,Fishelson和Golowich(Neurips'21)最近开发的高阶平滑度的繁琐框架。
In this paper we establish efficient and \emph{uncoupled} learning dynamics so that, when employed by all players in a general-sum multiplayer game, the \emph{swap regret} of each player after $T$ repetitions of the game is bounded by $O(\log T)$, improving over the prior best bounds of $O(\log^4 (T))$. At the same time, we guarantee optimal $O(\sqrt{T})$ swap regret in the adversarial regime as well. To obtain these results, our primary contribution is to show that when all players follow our dynamics with a \emph{time-invariant} learning rate, the \emph{second-order path lengths} of the dynamics up to time $T$ are bounded by $O(\log T)$, a fundamental property which could have further implications beyond near-optimally bounding the (swap) regret. Our proposed learning dynamics combine in a novel way \emph{optimistic} regularized learning with the use of \emph{self-concordant barriers}. Further, our analysis is remarkably simple, bypassing the cumbersome framework of higher-order smoothness recently developed by Daskalakis, Fishelson, and Golowich (NeurIPS'21).