基于SPRT的有效最佳手臂识别随机匪徒

论文标题

基于SPRT的有效最佳手臂识别随机匪徒

SPRT-based Efficient Best Arm Identification in Stochastic Bandits

论文作者

Mukherjee, Arpan, Tajer, Ali

论文摘要

本文在固定置信度设置中研究了随机多臂匪徒中最佳的手臂识别（BAI）问题。考虑到指数匪徒的一般类。指数匪徒家族的现有算法面临计算挑战。为了减轻这些挑战，将BAI问题视为顺序的复合假设测试任务，并提出了一个框架，该框架采用了基于似然比的测试，已知可对顺序测试有效。基于此测试统计量，设计了BAI算法的设计，该算法利用规范的顺序概率比测试进行了手臂选择，并且可以对指数的匪徒进行易于分析。该算法具有两个关键特征：（1）其样品复杂性在渐近上是最佳的，并且（2）保证它是$δ-$ pAC。现有的有效方法集中在高斯环境上，并需要对汤普森进行抽样，以使手臂被认为是最好的和挑战者手臂。此外，本文在分析上量化了在现有方法中识别挑战者的计算费用。最后，提供了数值实验来支持分析。

This paper investigates the best arm identification (BAI) problem in stochastic multi-armed bandits in the fixed confidence setting. The general class of the exponential family of bandits is considered. The existing algorithms for the exponential family of bandits face computational challenges. To mitigate these challenges, the BAI problem is viewed and analyzed as a sequential composite hypothesis testing task, and a framework is proposed that adopts the likelihood ratio-based tests known to be effective for sequential testing. Based on this test statistic, a BAI algorithm is designed that leverages the canonical sequential probability ratio tests for arm selection and is amenable to tractable analysis for the exponential family of bandits. This algorithm has two key features: (1) its sample complexity is asymptotically optimal, and (2) it is guaranteed to be $δ-$PAC. Existing efficient approaches focus on the Gaussian setting and require Thompson sampling for the arm deemed the best and the challenger arm. Additionally, this paper analytically quantifies the computational expense of identifying the challenger in an existing approach. Finally, numerical experiments are provided to support the analysis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题