与$ O（\ log t）$交换遗憾在多人游戏中的未耦合学习动态

论文标题

与$ O（\ log t）$交换遗憾在多人游戏中的未耦合学习动态

Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer Games

论文作者

Anagnostides, Ioannis, Farina, Gabriele, Kroer, Christian, Lee, Chung-Wei, Luo, Haipeng, Sandholm, Tuomas

论文摘要

在本文中，我们建立了有效的和\ emph {ucph {ucph}学习动力学，以便在所有玩家在一般的多人游戏中雇用所有玩家，在游戏的$ t $重复范围后，每个玩家的\ emph {swap遗憾}在$ o（\ log t）$中绑定到$ o（\ f t）$，改善了$ o（\ o（\ o（flog T）），$ o（\ o（\ o（\ o（^4）同时，我们保证最佳$ o（\ sqrt {t}）$交换也遗憾。为了获得这些结果，我们的主要贡献是表明，当所有玩家都以\ emph {time-Insinistiant}的学习率遵循我们的动态时，动态的\ emph {二阶路径长度}到时间$ t $都受$ o（\ log log t）限制的动力学，这是一种基本含义，可能会在近乎范围内的（swap）范围内的基本含义（swap），这是一个基本的含义。我们提出的学习动力学以一种新颖的方式结合了\ emph {乐观}的正规化学习，并使用\ emph {自我控制障碍}。此外，我们的分析非常简单，绕过了Daskalakis，Fishelson和Golowich（Neurips'21）最近开发的高阶平滑度的繁琐框架。

In this paper we establish efficient and \emph{uncoupled} learning dynamics so that, when employed by all players in a general-sum multiplayer game, the \emph{swap regret} of each player after $T$ repetitions of the game is bounded by $O(\log T)$, improving over the prior best bounds of $O(\log^4 (T))$. At the same time, we guarantee optimal $O(\sqrt{T})$ swap regret in the adversarial regime as well. To obtain these results, our primary contribution is to show that when all players follow our dynamics with a \emph{time-invariant} learning rate, the \emph{second-order path lengths} of the dynamics up to time $T$ are bounded by $O(\log T)$, a fundamental property which could have further implications beyond near-optimally bounding the (swap) regret. Our proposed learning dynamics combine in a novel way \emph{optimistic} regularized learning with the use of \emph{self-concordant barriers}. Further, our analysis is remarkably simple, bypassing the cumbersome framework of higher-order smoothness recently developed by Daskalakis, Fishelson, and Golowich (NeurIPS'21).

下载PDF全文

下载文献需遵守相关版权规定

论文标题