随机多军匪徒的流算法

论文标题

随机多军匪徒的流算法

Streaming Algorithms for Stochastic Multi-armed Bandits

论文作者

Maiti, Arnab, Patil, Vishakha, Khan, Arindam

论文摘要

我们研究了有限的手臂记忆中的随机多臂匪徒问题。在这种情况下，手臂到达流中，并且可以随时存储在内存中的武器数量。决策者只能拉动记忆中存在的手臂。我们从两个标准目标的角度解决了问题：1）遗憾的最小化和2）最佳武器识别。为了最小化遗憾，我们通过表现出几乎紧密的硬度来解决一个重要的开放问题。我们在对（n-1）的手臂记忆大小（n是n是臂的数量）的期望中显示ω（t^{2/3}）累积遗憾。为了获得最佳武器识别，我们研究了两种算法。首先，我们提出了一个O（R）手臂记忆R-ROND自适应流算法，以找到最佳的臂。在用于最佳臂识别的R-ROUND自适应流算法中，根据早期回合的观察结果，每轮的手臂拉动是在每轮中的。最好的臂是在R回合结束时的输出。我们算法的样品复杂性上的上限与任何R-ROND自适应流算法的下限匹配。其次，我们提出一种启发式方法，通过在记忆中仅存储一个额外的手臂来找到具有最佳样品复杂性的ε-最佳臂。

We study the Stochastic Multi-armed Bandit problem under bounded arm-memory. In this setting, the arms arrive in a stream, and the number of arms that can be stored in the memory at any time, is bounded. The decision-maker can only pull arms that are present in the memory. We address the problem from the perspective of two standard objectives: 1) regret minimization, and 2) best-arm identification. For regret minimization, we settle an important open question by showing an almost tight hardness. We show Ω(T^{2/3}) cumulative regret in expectation for arm-memory size of (n-1), where n is the number of arms. For best-arm identification, we study two algorithms. First, we present an O(r) arm-memory r-round adaptive streaming algorithm to find an ε-best arm. In r-round adaptive streaming algorithm for best-arm identification, the arm pulls in each round are decided based on the observed outcomes in the earlier rounds. The best-arm is the output at the end of r rounds. The upper bound on the sample complexity of our algorithm matches with the lower bound for any r-round adaptive streaming algorithm. Secondly, we present a heuristic to find the ε-best arm with optimal sample complexity, by storing only one extra arm in the memory.

下载PDF全文

下载文献需遵守相关版权规定

论文标题