通用凸游戏的近乎最佳的无重格学习动力

论文标题

通用凸游戏的近乎最佳的无重格学习动力

Near-Optimal No-Regret Learning Dynamics for General Convex Games

论文作者

Farina, Gabriele, Anagnostides, Ioannis, Luo, Haipeng, Lee, Chung-Wei, Kroer, Christian, Sandholm, Tuomas

论文摘要

最近的一项工作已经建立了未耦合的学习动态，以至于当$ t $重复的所有玩家在游戏中使用所有玩家时，每个玩家的\ emph {sorex}在$ t $中增长了polygarithmarithmarithmarithmarithmarithmarithmarithmarithmarithmarithmarithmarithmarogarithm $ t $，这是对无需重格框架中传统保证的指数改进。但是，到目前为止，这些结果仅限于具有结构化策略空间的某些类别的游戏，例如正常形式和广泛形式的游戏。关于$ o（\ text {polylog} t）$遗憾界限是否可以用于一般凸和紧凑型策略集的问题 - 在经济学和多种系统中的许多基本模型中都存在，同时保留有效的策略更新是一个重要的问题。在本文中，我们通过建立$ O（\ log t）$ player后悔的第一个未耦合学习算法来回答这一点，这是一般\ emph {convex Games}，即具有凹面实用功能的游戏，该游戏具有任意convex and Compact策略集支持的。我们的学习动力基于对适当\ emph {升起的}空间的乐观跟随范围的实例化，使用\ emph {self-condcordant正规器}，这是特定的，这不是可靠区域的障碍。此外，我们的学习动力是可以有效地实现的，可以在凸策略集中访问近端甲骨文，从而导致$ o（\ log \ log \ log t）$ ter-ter-ter-ter-ter-tir-tir-tir-tir-tir-tir-tir-tir-tir-tir-tir-tir-tir-tir-tir-tir-tir-tir-titeration Complactity;当仅假定仅对\ emph {Linear}优化Oracle访问时，我们还会给出扩展。最后，我们调整动态以保证对抗性制度中的$ O（\ sqrt {t}）$遗憾。即使在适用先前结果的特殊情况下，我们的算法也会改善对迭代次数的依赖或策略集的范围的遗憾界限。

A recent line of work has established uncoupled learning dynamics such that, when employed by all players in a game, each player's \emph{regret} after $T$ repetitions grows polylogarithmically in $T$, an exponential improvement over the traditional guarantees within the no-regret framework. However, so far these results have only been limited to certain classes of games with structured strategy spaces -- such as normal-form and extensive-form games. The question as to whether $O(\text{polylog} T)$ regret bounds can be obtained for general convex and compact strategy sets -- which occur in many fundamental models in economics and multiagent systems -- while retaining efficient strategy updates is an important question. In this paper, we answer this in the positive by establishing the first uncoupled learning algorithm with $O(\log T)$ per-player regret in general \emph{convex games}, that is, games with concave utility functions supported on arbitrary convex and compact strategy sets. Our learning dynamics are based on an instantiation of optimistic follow-the-regularized-leader over an appropriately \emph{lifted} space using a \emph{self-concordant regularizer} that is, peculiarly, not a barrier for the feasible region. Further, our learning dynamics are efficiently implementable given access to a proximal oracle for the convex strategy set, leading to $O(\log\log T)$ per-iteration complexity; we also give extensions when access to only a \emph{linear} optimization oracle is assumed. Finally, we adapt our dynamics to guarantee $O(\sqrt{T})$ regret in the adversarial regime. Even in those special cases where prior results apply, our algorithm improves over the state-of-the-art regret bounds either in terms of the dependence on the number of iterations or on the dimension of the strategy sets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题