论文标题
通用凸游戏的近乎最佳的无重格学习动力
Near-Optimal No-Regret Learning Dynamics for General Convex Games
论文作者
论文摘要
最近的一项工作已经建立了未耦合的学习动态,以至于当$ t $重复的所有玩家在游戏中使用所有玩家时,每个玩家的\ emph {sorex}在$ t $中增长了polygarithmarithmarithmarithmarithmarithmarithmarithmarithmarithmarithmarithmarithmarogarithm $ t $,这是对无需重格框架中传统保证的指数改进。但是,到目前为止,这些结果仅限于具有结构化策略空间的某些类别的游戏,例如正常形式和广泛形式的游戏。关于$ o(\ text {polylog} t)$遗憾界限是否可以用于一般凸和紧凑型策略集的问题 - 在经济学和多种系统中的许多基本模型中都存在,同时保留有效的策略更新是一个重要的问题。在本文中,我们通过建立$ O(\ log t)$ player后悔的第一个未耦合学习算法来回答这一点,这是一般\ emph {convex Games},即具有凹面实用功能的游戏,该游戏具有任意convex and Compact策略集支持的。我们的学习动力基于对适当\ emph {升起的}空间的乐观跟随范围的实例化,使用\ emph {self-condcordant正规器},这是特定的,这不是可靠区域的障碍。此外,我们的学习动力是可以有效地实现的,可以在凸策略集中访问近端甲骨文,从而导致$ o(\ log \ log \ log t)$ ter-ter-ter-ter-ter-tir-tir-tir-tir-tir-tir-tir-tir-tir-tir-tir-tir-tir-tir-tir-tir-tir-tir-titeration Complactity;当仅假定仅对\ emph {Linear}优化Oracle访问时,我们还会给出扩展。最后,我们调整动态以保证对抗性制度中的$ O(\ sqrt {t})$遗憾。即使在适用先前结果的特殊情况下,我们的算法也会改善对迭代次数的依赖或策略集的范围的遗憾界限。
A recent line of work has established uncoupled learning dynamics such that, when employed by all players in a game, each player's \emph{regret} after $T$ repetitions grows polylogarithmically in $T$, an exponential improvement over the traditional guarantees within the no-regret framework. However, so far these results have only been limited to certain classes of games with structured strategy spaces -- such as normal-form and extensive-form games. The question as to whether $O(\text{polylog} T)$ regret bounds can be obtained for general convex and compact strategy sets -- which occur in many fundamental models in economics and multiagent systems -- while retaining efficient strategy updates is an important question. In this paper, we answer this in the positive by establishing the first uncoupled learning algorithm with $O(\log T)$ per-player regret in general \emph{convex games}, that is, games with concave utility functions supported on arbitrary convex and compact strategy sets. Our learning dynamics are based on an instantiation of optimistic follow-the-regularized-leader over an appropriately \emph{lifted} space using a \emph{self-concordant regularizer} that is, peculiarly, not a barrier for the feasible region. Further, our learning dynamics are efficiently implementable given access to a proximal oracle for the convex strategy set, leading to $O(\log\log T)$ per-iteration complexity; we also give extensions when access to only a \emph{linear} optimization oracle is assumed. Finally, we adapt our dynamics to guarantee $O(\sqrt{T})$ regret in the adversarial regime. Even in those special cases where prior results apply, our algorithm improves over the state-of-the-art regret bounds either in terms of the dependence on the number of iterations or on the dimension of the strategy sets.