样本 - 优化批处理神经汤普森抽样

论文标题

样本 - 优化批处理神经汤普森抽样

Sample-Then-Optimize Batch Neural Thompson Sampling

论文作者

Dai, Zhongxiang, Shu, Yao, Low, Bryan Kian Hsiang, Jaillet, Patrick

论文摘要

贝叶斯优化（BO）使用高斯过程（GP）作为对其目标函数建模的替代品，非常受欢迎。但是，由于GPS的局限性，BO在某些问题中的表现不佳，例如具有分类，高维或图像输入的问题。为此，最近的作品将高度表达性的神经网络（NNS）用作替代模型，并使用神经切线核（NTK）的理论得出了理论保证。但是，这些作品遭受了颠倒极大的参数矩阵以及对顺序（而不是批次）设置的限制的局限性。为了克服这些局限性，我们基于汤普森采样（TS）策略，介绍了两种算法，称为样品，然后优化批处理神经TS（STO-BNTS）和Sto-BNTS线性。要选择输入查询，我们只需要训练NN（分别是线性模型），然后通过最大化受过训练的NN（分别线性模型）来选择查询，该查询（分别线性模型）等效地从GP后部采样，将NTK作为内核函数进行采样。结果，我们的算法避开了倒置大型参数矩阵但仍保留TS策略的有效性的需求。接下来，我们通过批次评估为算法提供了遗憾的上限，并使用批处理bo和ntk的见解表明它们在某些条件下是渐近的无regret。最后，我们使用实用的汽车和强化学习实验来验证它们的经验有效性。

Bayesian optimization (BO), which uses a Gaussian process (GP) as a surrogate to model its objective function, is popular for black-box optimization. However, due to the limitations of GPs, BO underperforms in some problems such as those with categorical, high-dimensional or image inputs. To this end, recent works have used the highly expressive neural networks (NNs) as the surrogate model and derived theoretical guarantees using the theory of neural tangent kernel (NTK). However, these works suffer from the limitations of the requirement to invert an extremely large parameter matrix and the restriction to the sequential (rather than batch) setting. To overcome these limitations, we introduce two algorithms based on the Thompson sampling (TS) policy named Sample-Then-Optimize Batch Neural TS (STO-BNTS) and STO-BNTS-Linear. To choose an input query, we only need to train an NN (resp. a linear model) and then choose the query by maximizing the trained NN (resp. linear model), which is equivalently sampled from the GP posterior with the NTK as the kernel function. As a result, our algorithms sidestep the need to invert the large parameter matrix yet still preserve the validity of the TS policy. Next, we derive regret upper bounds for our algorithms with batch evaluations, and use insights from batch BO and NTK to show that they are asymptotically no-regret under certain conditions. Finally, we verify their empirical effectiveness using practical AutoML and reinforcement learning experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题