用上下文线性土匪学习元表示学习

论文标题

用上下文线性土匪学习元表示学习

Meta Representation Learning with Contextual Linear Bandits

论文作者

Cella, Leonardo, Lounici, Karim, Pontil, Massimiliano

论文摘要

元学习旨在构建算法，这些算法迅速学习如何根据以前的经验解决新的学习问题。在本文中，我们研究了在随机线性匪徒任务的环境中进行元学习。我们假设这些任务具有低维表示，该任务已从以前的学习任务中部分获取。我们的目标是利用这些信息，以学习一项新的下游强盗任务，该任务共享相同的表示。我们的主要贡献是表明，如果学识渊博的代表性很好地估计了未知的代表，那么我们在这项工作中提出的贪婪政策可以有效地学习下游任务。我们在此政策的遗憾中得出了一个上限，即$ r \ sqrt {n}（1 \ vee \ sqrt {d/t}）$的对数因素，其中$ n $是下游任务的视野，$ t $ t $是培训任务的数量，$ d $ d $ d $ d $ d $ rl d $ rl ll d $ r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r rl ll a nl dl c \ r r ll rl ll a n ll。我们强调，我们的策略不需要知道$ r $。我们注意到，如果$ t> d $使用真实的基础表示形式，我们的界限就可以实现相同的最佳最小bandit算法的速率。我们的分析受到启发，部分是基于以前在I.I.D.中进行元学习的工作。完整信息设置\ citep {tripuraneni2021Provable，boursier20222trace}。作为单独的贡献，我们展示了如何放宽这些作品中某些假设，从而改善了他们的表示和风险分析。

Meta-learning seeks to build algorithms that rapidly learn how to solve new learning problems based on previous experience. In this paper we investigate meta-learning in the setting of stochastic linear bandit tasks. We assume that the tasks share a low dimensional representation, which has been partially acquired from previous learning tasks. We aim to leverage this information in order to learn a new downstream bandit task, which shares the same representation. Our principal contribution is to show that if the learned representation estimates well the unknown one, then the downstream task can be efficiently learned by a greedy policy that we propose in this work. We derive an upper bound on the regret of this policy, which is, up to logarithmic factors, of order $r\sqrt{N}(1\vee \sqrt{d/T})$, where $N$ is the horizon of the downstream task, $T$ is the number of training tasks, $d$ the ambient dimension and $r \ll d$ the dimension of the representation. We highlight that our strategy does not need to know $r$. We note that if $T> d$ our bound achieves the same rate of optimal minimax bandit algorithms using the true underlying representation. Our analysis is inspired and builds in part upon previous work on meta-learning in the i.i.d. full information setting \citep{tripuraneni2021provable,boursier2022trace}. As a separate contribution we show how to relax certain assumptions in those works, thereby improving their representation learning and risk analysis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题