论文标题
高斯工艺分类匪徒
Gaussian Process Classification Bandits
论文作者
论文摘要
分类匪徒是多军匪徒问题,其任务是将给定的武器组分为正或负类,具体取决于至少H和W的预期奖励的武器速率,对于给定阈值H和W。我们研究了一个特殊的分类匪徒问题,其中臂对应于D维真实空间中的点X,其预期奖励F(x)是根据高斯工艺生成的。我们使用各种ARM选择策略为问题开发了框架算法,并提出了称为FCB和FTSV的策略。我们显示了FCB的样本复杂性上限较小,而对于水平集估计的现有算法,我们显示的样本复杂性上限至少必须为每个ARM的X确定f(x)的现有算法。还提出了根据至少H的奖励的估计臂率的手臂选择策略,并显示出可提高经验样本复杂性。根据我们的实验结果,FCB和FTSV的速率估计版本以及流行的主动学习策略的速率版本,该策略选择了最大差异,胜过合成功能的其他策略,而FTSV的版本也是我们实际数据集的最佳性能。
Classification bandits are multi-armed bandit problems whose task is to classify a given set of arms into either positive or negative class depending on whether the rate of the arms with the expected reward of at least h is not less than w for given thresholds h and w. We study a special classification bandit problem in which arms correspond to points x in d-dimensional real space with expected rewards f(x) which are generated according to a Gaussian process prior. We develop a framework algorithm for the problem using various arm selection policies and propose policies called FCB and FTSV. We show a smaller sample complexity upper bound for FCB than that for the existing algorithm of the level set estimation, in which whether f(x) is at least h or not must be decided for every arm's x. Arm selection policies depending on an estimated rate of arms with rewards of at least h are also proposed and shown to improve empirical sample complexity. According to our experimental results, the rate-estimation versions of FCB and FTSV, together with that of the popular active learning policy that selects the point with the maximum variance, outperform other policies for synthetic functions, and the version of FTSV is also the best performer for our real-world dataset.