论文标题

可证明,实际上有效的神经背景匪徒

Provably and Practically Efficient Neural Contextual Bandits

论文作者

Salgia, Sudeep, Vakili, Sattar, Zhao, Qing

论文摘要

我们考虑神经上下文的强盗问题。与主要集中于Relu神经网的现有工作相反,我们考虑了一系列平滑的激活功能。在这种更一般的环境下,(i)我们在过度参数化的神经网及其相应的神经切线核之间的差异得出了非反应误差的界限,(ii)我们提出了一种具有可证明的sublinear遗憾界限的算法,该算法在有限的政权中也有效,如经验研究所证明。作为建立神经背景匪徒中激活函数平滑度与核匪内核的平滑度之间的关系之间的关系的工具,非反应误差界限可能具有更广泛的兴趣。

We consider the neural contextual bandit problem. In contrast to the existing work which primarily focuses on ReLU neural nets, we consider a general set of smooth activation functions. Under this more general setting, (i) we derive non-asymptotic error bounds on the difference between an overparameterized neural net and its corresponding neural tangent kernel, (ii) we propose an algorithm with a provably sublinear regret bound that is also efficient in the finite regime as demonstrated by empirical studies. The non-asymptotic error bounds may be of broader interest as a tool to establish the relation between the smoothness of the activation functions in neural contextual bandits and the smoothness of the kernels in kernel bandits.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源