论文标题
通过加权模型在匪徒中的假设转移
Hypothesis Transfer in Bandits by Weighted Models
论文作者
论文摘要
我们在假设转移学习的设置中考虑了上下文多军匪徒的问题。也就是说,我们假设可以在一组未观察到的上下文集上访问以前学习的模型,并且我们利用它来加快对新强盗问题的探索。我们的转移策略是基于重新加权方案的基础,我们在需要转移时表现出对经典线性UCB的遗憾,同时在两个任务无关时就恢复了经典的遗憾率。我们将此方法进一步扩展到任意数量的源模型,其中算法决定在每个时间步骤中首选哪个模型。此外,我们讨论一种方法,即在经典的Linucb算法中,给出了源模型的动态凸组合。通过对模拟和现实世界数据的经验评估证实了我们提出的方法的算法和理论分析。
We consider the problem of contextual multi-armed bandits in the setting of hypothesis transfer learning. That is, we assume having access to a previously learned model on an unobserved set of contexts, and we leverage it in order to accelerate exploration on a new bandit problem. Our transfer strategy is based on a re-weighting scheme for which we show a reduction in the regret over the classic Linear UCB when transfer is desired, while recovering the classic regret rate when the two tasks are unrelated. We further extend this method to an arbitrary amount of source models, where the algorithm decides which model is preferred at each time step. Additionally we discuss an approach where a dynamic convex combination of source models is given in terms of a biased regularization term in the classic LinUCB algorithm. The algorithms and the theoretical analysis of our proposed methods substantiated by empirical evaluations on simulated and real-world data.