论文标题

使用封闭线性网络在上下文匪徒中的在线学习

Online Learning in Contextual Bandits using Gated Linear Networks

论文作者

Sezener, Eren, Hutter, Marcus, Budden, David, Wang, Jianan, Veness, Joel

论文摘要

我们介绍了一种新的,完全在线的上下文强盗算法,称为盖式线性上下文强盗(GLCB)。该算法基于封闭式线性网络(GLNS),该算法是最近引入的深度学习体系结构,其属性非常适合在线设置。利用GLN的数据依赖性门控性能,我们能够通过有效零算法的开销来估计预测不确定性。与9种利用深神经网络的最先进算法相比,我们在经验上评估了GLCB,该算法是在标准的基准套件中,包括离散和连续的上下文匪徒问题。尽管是唯一的在线方法,但GLCB还是获得了中位数的第一名,我们通过对其收敛性的理论研究进一步支持这些结果。

We introduce a new and completely online contextual bandit algorithm called Gated Linear Contextual Bandits (GLCB). This algorithm is based on Gated Linear Networks (GLNs), a recently introduced deep learning architecture with properties well-suited to the online setting. Leveraging data-dependent gating properties of the GLN we are able to estimate prediction uncertainty with effectively zero algorithmic overhead. We empirically evaluate GLCB compared to 9 state-of-the-art algorithms that leverage deep neural networks, on a standard benchmark suite of discrete and continuous contextual bandit problems. GLCB obtains median first-place despite being the only online method, and we further support these results with a theoretical study of its convergence properties.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源