离线拥堵游戏：反馈类型如何影响数据覆盖要求

论文标题

离线拥堵游戏：反馈类型如何影响数据覆盖要求

Offline congestion games: How feedback type affects data coverage requirement

论文作者

Jiang, Haozhe, Cui, Qiwen, Xiong, Zhihan, Fazel, Maryam, Du, Simon S.

论文摘要

本文调查了何时可以在离线拥堵游戏中有效恢复近似NASH平衡（NE）。离线通用游戏中现有的数据集覆盖范围假设不可避免地会依赖于操作数量，这在拥堵游戏中可能是指数级的。我们考虑了三种不同类型的反馈，随着显示的信息减少。从设施级别（又称半伴侣）反馈开始，我们提出了一种新型的单位偏差覆盖条件，并给出一种悲观的型算法，可以恢复近似NE。对于代理级别（又称强盗）反馈设置，我们表明单单元偏差覆盖条件还不够。另一方面，我们将游戏转换为多代理线性匪徒，并表明，通过在离线线性匪徒中的广义数据覆盖范围假设，我们可以有效地恢复近似NE。最后，我们考虑了一种新颖的反馈类型，即游戏级别的反馈，其中只有所有代理商的总奖励才被揭示。同样，我们显示了代理级反馈设置的覆盖范围假设在游戏级反馈设置中不足，并且使用线性匪徒的数据覆盖范围假设更强，我们可以恢复近似的NE。总之，我们的结果构成了离线交通拥堵游戏的首次研究，并意味着不同类型的反馈之间的正式分离。

This paper investigates when one can efficiently recover an approximate Nash Equilibrium (NE) in offline congestion games. The existing dataset coverage assumption in offline general-sum games inevitably incurs a dependency on the number of actions, which can be exponentially large in congestion games. We consider three different types of feedback with decreasing revealed information. Starting from the facility-level (a.k.a., semi-bandit) feedback, we propose a novel one-unit deviation coverage condition and give a pessimism-type algorithm that can recover an approximate NE. For the agent-level (a.k.a., bandit) feedback setting, interestingly, we show the one-unit deviation coverage condition is not sufficient. On the other hand, we convert the game to multi-agent linear bandits and show that with a generalized data coverage assumption in offline linear bandits, we can efficiently recover the approximate NE. Lastly, we consider a novel type of feedback, the game-level feedback where only the total reward from all agents is revealed. Again, we show the coverage assumption for the agent-level feedback setting is insufficient in the game-level feedback setting, and with a stronger version of the data coverage assumption for linear bandits, we can recover an approximate NE. Together, our results constitute the first study of offline congestion games and imply formal separations between different types of feedback.

下载PDF全文

下载文献需遵守相关版权规定

论文标题