Anaconda：一种适用于自适应非平稳决斗的动态遗憾算法

论文标题

Anaconda：一种适用于自适应非平稳决斗的动态遗憾算法

ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits

论文作者

Buening, Thomas Kleine, Saha, Aadirupa

论文摘要

我们研究了非平稳决斗匪徒的问题，并为此问题提供了第一种自适应动态遗憾算法。在多个维度上，这一工作中仅有的两次现有尝试都缺乏，包括对非平稳复杂性和非自适应参数调整的悲观度量，需要了解偏好变化的数量。我们开发了一种基于淘汰的重新安排算法来克服这些缺点，并显示了一个近乎最佳的$ \ tilde {o}（\ sqrt {\ sqrt {s^{\ texttt {cw}} t} t} t} t} $ sextic $ hears $ heart的遗憾，其中$ s^{\ sexttt {回合。这产生了第一个未知$ s^{\ texttt {cw}} $的第一个近乎最佳动态遗憾算法。我们进一步研究了非平稳性的其他相关概念，我们还证明，在基本偏好模型的其他假设下，我们还证明了近乎最佳的动态遗憾。

We study the problem of non-stationary dueling bandits and provide the first adaptive dynamic regret algorithm for this problem. The only two existing attempts in this line of work fall short across multiple dimensions, including pessimistic measures of non-stationary complexity and non-adaptive parameter tuning that requires knowledge of the number of preference changes. We develop an elimination-based rescheduling algorithm to overcome these shortcomings and show a near-optimal $\tilde{O}(\sqrt{S^{\texttt{CW}} T})$ dynamic regret bound, where $S^{\texttt{CW}}$ is the number of times the Condorcet winner changes in $T$ rounds. This yields the first near-optimal dynamic regret algorithm for unknown $S^{\texttt{CW}}$. We further study other related notions of non-stationarity for which we also prove near-optimal dynamic regret guarantees under additional assumptions on the underlying preference model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题