使用封闭线性网络在上下文匪徒中的在线学习

论文标题

使用封闭线性网络在上下文匪徒中的在线学习

Online Learning in Contextual Bandits using Gated Linear Networks

论文作者

Sezener, Eren, Hutter, Marcus, Budden, David, Wang, Jianan, Veness, Joel

论文摘要

我们介绍了一种新的，完全在线的上下文强盗算法，称为盖式线性上下文强盗（GLCB）。该算法基于封闭式线性网络（GLNS），该算法是最近引入的深度学习体系结构，其属性非常适合在线设置。利用GLN的数据依赖性门控性能，我们能够通过有效零算法的开销来估计预测不确定性。与9种利用深神经网络的最先进算法相比，我们在经验上评估了GLCB，该算法是在标准的基准套件中，包括离散和连续的上下文匪徒问题。尽管是唯一的在线方法，但GLCB还是获得了中位数的第一名，我们通过对其收敛性的理论研究进一步支持这些结果。

We introduce a new and completely online contextual bandit algorithm called Gated Linear Contextual Bandits (GLCB). This algorithm is based on Gated Linear Networks (GLNs), a recently introduced deep learning architecture with properties well-suited to the online setting. Leveraging data-dependent gating properties of the GLN we are able to estimate prediction uncertainty with effectively zero algorithmic overhead. We empirically evaluate GLCB compared to 9 state-of-the-art algorithms that leverage deep neural networks, on a standard benchmark suite of discrete and continuous contextual bandit problems. GLCB obtains median first-place despite being the only online method, and we further support these results with a theoretical study of its convergence properties.

下载PDF全文

下载文献需遵守相关版权规定

论文标题