论文标题
强盗学习中的高斯想象力
Gaussian Imagination in Bandit Learning
论文作者
论文摘要
假设分布是高斯通常会促进原本棘手的计算。我们研究了与具有高斯先前分布的强盗环境相对于具有高斯先前分布和高斯可能性函数的界限信息比的性能,而当应用于Bernoulli Bandit。相对于贝叶斯遗憾的信息理论,代理商与高斯强盗互动时会产生的贝叶斯遗憾,当特工与伯诺利·匪徒互动时,我们将遗憾增加。如果高斯先前的分布和可能性函数足够扩散,则这种增加的增长速率最多是在时间范围的平方根中线性的,因此每次临时增加的增加。我们的结果正式化了这样的民间传说,即当用弥漫性拼写错误的分布实例化时,所谓的贝叶斯特工仍然有效。
Assuming distributions are Gaussian often facilitates computations that are otherwise intractable. We study the performance of an agent that attains a bounded information ratio with respect to a bandit environment with a Gaussian prior distribution and a Gaussian likelihood function when applied instead to a Bernoulli bandit. Relative to an information-theoretic bound on the Bayesian regret the agent would incur when interacting with the Gaussian bandit, we bound the increase in regret when the agent interacts with the Bernoulli bandit. If the Gaussian prior distribution and likelihood function are sufficiently diffuse, this increase grows at a rate which is at most linear in the square-root of the time horizon, and thus the per-timestep increase vanishes. Our results formalize the folklore that so-called Bayesian agents remain effective when instantiated with diffuse misspecified distributions.