迈向带有随机步行反馈的多军匪徒的基本限制

论文标题

迈向带有随机步行反馈的多军匪徒的基本限制

Towards Fundamental Limits of Multi-armed Bandits with Random Walk Feedback

论文作者

Wang, Tianyu, Yang, Lin F., Wang, Zizhuo

论文摘要

在本文中，我们考虑了一个新的多武器强盗（MAB）问题，其中手臂是一个未知且可能变化的图形中的节点，并且代理（i）通过拉动手臂启动随机步行，（ii）观察随机步行轨迹，（iii）接收到与步行长度相等的回报。我们通过研究随机环境和对抗性环境，对这个问题提供了全面的理解。我们表明，在信息理论意义上，尽管可以通过随机步行轨迹获得其他信息，但在信息理论意义上，这个问题并不比标准mAB容易。还研究了强盗算法在此问题上的行为。

In this paper, we consider a new Multi-Armed Bandit (MAB) problem where arms are nodes in an unknown and possibly changing graph, and the agent (i) initiates random walks over the graph by pulling arms, (ii) observes the random walk trajectories, and (iii) receives rewards equal to the lengths of the walks. We provide a comprehensive understanding of this problem by studying both the stochastic and the adversarial setting. We show that this problem is not easier than a standard MAB in an information theoretical sense, although additional information is available through random walk trajectories. Behaviors of bandit algorithms on this problem are also studied.

下载PDF全文

下载文献需遵守相关版权规定

论文标题