上下文多军匪徒的转移学习

论文标题

上下文多军匪徒的转移学习

Transfer Learning for Contextual Multi-armed Bandits

论文作者

Cai, Changxiao, Cai, T. Tony, Li, Hongzhe

论文摘要

在一系列应用程序中，我们在本文中研究了在协变量偏移模型下的非参数上下文多臂匪徒的转移学习问题，在该模型开始之前，我们在目标匪徒学习开始之前收集了有关源土匪的数据。建立了累积遗憾的最小收敛速率，并提出了一种新颖的转移学习算法，以达到最小值。结果量化了在非参数上下文多臂匪徒的背景下，来自源域中数据对目标域学习的贡献。鉴于总体上不可能适应未知的平滑度，我们开发了一种数据驱动的算法，该算法可实现近乎最佳的统计保证（最多可对数因子），同时在额外的自动模仿假设下自动适应了大量参数空间的未知参数。进行了仿真研究，以说明利用来自辅助源域中学习的数据在目标域中学习的好处。

Motivated by a range of applications, we study in this paper the problem of transfer learning for nonparametric contextual multi-armed bandits under the covariate shift model, where we have data collected on source bandits before the start of the target bandit learning. The minimax rate of convergence for the cumulative regret is established and a novel transfer learning algorithm that attains the minimax regret is proposed. The results quantify the contribution of the data from the source domains for learning in the target domain in the context of nonparametric contextual multi-armed bandits. In view of the general impossibility of adaptation to unknown smoothness, we develop a data-driven algorithm that achieves near-optimal statistical guarantees (up to a logarithmic factor) while automatically adapting to the unknown parameters over a large collection of parameter spaces under an additional self-similarity assumption. A simulation study is carried out to illustrate the benefits of utilizing the data from the auxiliary source domains for learning in the target domain.

下载PDF全文

下载文献需遵守相关版权规定

论文标题