最佳的遗憾是可以通过有限的近似推理错误来实现的：增强的贝叶斯上限绑定框架

论文标题

最佳的遗憾是可以通过有限的近似推理错误来实现的：增强的贝叶斯上限绑定框架

Optimal Regret Is Achievable with Bounded Approximate Inference Error: An Enhanced Bayesian Upper Confidence Bound Framework

论文作者

Huang, Ziyi, Lam, Henry, Meisami, Amirhossein, Zhang, Haofeng

论文摘要

具有近似贝叶斯推断的贝叶斯匪徒算法已被广泛用于现实世界应用中。但是，这些方法的卓越实践表现与它们的理论理由之间存在很大差异。以前的研究仅表示负面的理论结果：汤普森采样可能具有最差的线性后悔$ω（t）$，并且在一个$α$ divergence测得的推理错误上持续阈值。为了弥合这一差距，我们提出了增强的贝叶斯上置信度结合（EBUCB）框架，该框架可以在近似推理的情况下有效地解决匪徒问题。我们的理论分析表明，对于Bernoulli多臂匪徒，如果由两个不同的$α$ -DDIVERCENCES测得的推理错误，EBUCB可以实现最佳的后悔顺序$ O（\ log t）$，无论该常数多么大。据我们所知，我们的研究提供了在持续近似推理错误的情况下，它比$ o（t）$更好。此外，与以前的研究中的负面结果一致，我们表明，只有一个有限的$α$ didivergence不足以保证次线性遗憾。

Bayesian bandit algorithms with approximate Bayesian inference have been widely used in real-world applications. However, there is a large discrepancy between the superior practical performance of these approaches and their theoretical justification. Previous research only indicates a negative theoretical result: Thompson sampling could have a worst-case linear regret $Ω(T)$ with a constant threshold on the inference error measured by one $α$-divergence. To bridge this gap, we propose an Enhanced Bayesian Upper Confidence Bound (EBUCB) framework that can efficiently accommodate bandit problems in the presence of approximate inference. Our theoretical analysis demonstrates that for Bernoulli multi-armed bandits, EBUCB can achieve the optimal regret order $O(\log T)$ if the inference error measured by two different $α$-divergences is less than a constant, regardless of how large this constant is. To our best knowledge, our study provides the first theoretical regret bound that is better than $o(T)$ in the setting of constant approximate inference error. Furthermore, in concordance with the negative results in previous studies, we show that only one bounded $α$-divergence is insufficient to guarantee a sub-linear regret.

下载PDF全文

下载文献需遵守相关版权规定

论文标题