在无向图上强大的多代理匪徒

论文标题

在无向图上强大的多代理匪徒

Robust Multi-Agent Bandits Over Undirected Graphs

论文作者

Vial, Daniel, Shakkottai, Sanjay, Srikant, R.

论文摘要

我们考虑了一个多工具的强盗环境，其中$ n $诚实的代理商通过网络进行了协作，以最大程度地减少遗憾，但是$ m $恶意的代理商可以任意中断学习。假设网络是完整的图，则现有的算法产生$ O（（（m + k / n）\ log（t） /δ）$在此设置中后悔，其中$ k $是武器的数量，$δ$是臂间隙。对于$ m \ ll k $，这比$ o（k \ log（t）/δ）$的单格基线遗憾改善。在这项工作中，我们表明情况超出了完整的图表。特别是，我们证明，如果最新的算法在无向线图上使用，那么诚实的代理商可能会遭受（几乎）线性的遗憾，直到时间为$ k $和$ n $的时间达到双重指数。 In light of this negative result, we propose a new algorithm for which the $i$-th agent has regret $O( ( d_{\text{mal}}(i) + K/n) \log(T)/Δ)$ on any connected and undirected graph, where $d_{\text{mal}}(i)$ is the number of $i$'s neighbors who are malicious.因此，我们将现有的后悔界限概括到完整的图表之外（其中$ d _ {\ text {mal}}（i）= m $），显示恶意代理的效果完全是本地的（从某种意义上说，只有$ d _ {\ text {mal}} {mal}}（mal}}}（i）恶意代理人直接与$ i $ y $ i $相关）。

We consider a multi-agent multi-armed bandit setting in which $n$ honest agents collaborate over a network to minimize regret but $m$ malicious agents can disrupt learning arbitrarily. Assuming the network is the complete graph, existing algorithms incur $O( (m + K/n) \log (T) / Δ)$ regret in this setting, where $K$ is the number of arms and $Δ$ is the arm gap. For $m \ll K$, this improves over the single-agent baseline regret of $O(K\log(T)/Δ)$. In this work, we show the situation is murkier beyond the case of a complete graph. In particular, we prove that if the state-of-the-art algorithm is used on the undirected line graph, honest agents can suffer (nearly) linear regret until time is doubly exponential in $K$ and $n$. In light of this negative result, we propose a new algorithm for which the $i$-th agent has regret $O( ( d_{\text{mal}}(i) + K/n) \log(T)/Δ)$ on any connected and undirected graph, where $d_{\text{mal}}(i)$ is the number of $i$'s neighbors who are malicious. Thus, we generalize existing regret bounds beyond the complete graph (where $d_{\text{mal}}(i) = m$), and show the effect of malicious agents is entirely local (in the sense that only the $d_{\text{mal}}(i)$ malicious agents directly connected to $i$ affect its long-term regret).

下载PDF全文

下载文献需遵守相关版权规定

论文标题