几乎对随机匪徒和防御措施的最佳对抗性攻击具有平滑的反应

论文标题

几乎对随机匪徒和防御措施的最佳对抗性攻击具有平滑的反应

Near Optimal Adversarial Attacks on Stochastic Bandits and Defenses with Smoothed Responses

论文作者

Zuo, Shiliang

论文摘要

我研究了针对随机匪徒算法的对抗性攻击。在每一轮比赛中，学习者都会选择一个手臂，并产生随机奖励。对手从战略上加入了奖励，学习者只能在每回合都观察到损坏的奖励。本文介绍了两组结果。第一组研究对手的最佳攻击策略。对手有一个目标部门他希望促进的目标，他的目标是操纵学习者选择这个目标部门$ t -o（t）$ times。我设计了针对UCB和Thompson采样的攻击策略，该策略仅花费$ \ wideHat {o}（\ sqrt {\ log t}）$成本。提出了匹配的下限，并完全表征了UCB，Thompson采样和$ \ Varepsilon $ -Greedy的脆弱性。第二盘研究学习者如何防御对手。受到平滑分析和行为经济学的文献的启发，我提出了两种简单的算法，这些算法是任意接近1的竞争比率。

I study adversarial attacks against stochastic bandit algorithms. At each round, the learner chooses an arm, and a stochastic reward is generated. The adversary strategically adds corruption to the reward, and the learner is only able to observe the corrupted reward at each round. Two sets of results are presented in this paper. The first set studies the optimal attack strategies for the adversary. The adversary has a target arm he wishes to promote, and his goal is to manipulate the learner into choosing this target arm $T - o(T)$ times. I design attack strategies against UCB and Thompson Sampling that only spend $\widehat{O}(\sqrt{\log T})$ cost. Matching lower bounds are presented, and the vulnerability of UCB, Thompson sampling, and $\varepsilon$-greedy are exactly characterized. The second set studies how the learner can defend against the adversary. Inspired by literature on smoothed analysis and behavioral economics, I present two simple algorithms that achieve a competitive ratio arbitrarily close to 1.

下载PDF全文

下载文献需遵守相关版权规定

论文标题