论文标题

随机多军匪徒的流算法

Streaming Algorithms for Stochastic Multi-armed Bandits

论文作者

Maiti, Arnab, Patil, Vishakha, Khan, Arindam

论文摘要

我们研究了有限的手臂记忆中的随机多臂匪徒问题。在这种情况下,手臂到达流中,并且可以随时存储在内存中的武器数量。决策者只能拉动记忆中存在的手臂。我们从两个标准目标的角度解决了问题:1)遗憾的最小化和2)最佳武器识别。为了最小化遗憾,我们通过表现出几乎紧密的硬度来解决一个重要的开放问题。我们在对(n-1)的手臂记忆大小(n是n是臂的数量)的期望中显示ω(t^{2/3})累积遗憾。为了获得最佳武器识别,我们研究了两种算法。首先,我们提出了一个O(R)手臂记忆R-ROND自适应流算法,以找到最佳的臂。在用于最佳臂识别的R-ROUND自适应流算法中,根据早期回合的观察结果,每轮的手臂拉动是在每轮中的。最好的臂是在R回合结束时的输出。我们算法的样品复杂性上的上限与任何R-ROND自适应流算法的下限匹配。其次,我们提出一种启发式方法,通过在记忆中仅存储一个额外的手臂来找到具有最佳样品复杂性的ε-最佳臂。

We study the Stochastic Multi-armed Bandit problem under bounded arm-memory. In this setting, the arms arrive in a stream, and the number of arms that can be stored in the memory at any time, is bounded. The decision-maker can only pull arms that are present in the memory. We address the problem from the perspective of two standard objectives: 1) regret minimization, and 2) best-arm identification. For regret minimization, we settle an important open question by showing an almost tight hardness. We show Ω(T^{2/3}) cumulative regret in expectation for arm-memory size of (n-1), where n is the number of arms. For best-arm identification, we study two algorithms. First, we present an O(r) arm-memory r-round adaptive streaming algorithm to find an ε-best arm. In r-round adaptive streaming algorithm for best-arm identification, the arm pulls in each round are decided based on the observed outcomes in the earlier rounds. The best-arm is the output at the end of r rounds. The upper bound on the sample complexity of our algorithm matches with the lower bound for any r-round adaptive streaming algorithm. Secondly, we present a heuristic to find the ε-best arm with optimal sample complexity, by storing only one extra arm in the memory.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源