单纯性神经人口学习：对称零和对称的贝叶斯贝叶斯的贝叶斯典型性

论文标题

单纯性神经人口学习：对称零和对称的贝叶斯贝叶斯的贝叶斯典型性

Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games

论文作者

Liu, Siqi, Lanctot, Marc, Marris, Luke, Heess, Nicolas

论文摘要

在各种策略中，学会对任何混合物进行最佳作用是竞争游戏的重要实践兴趣。在本文中，我们提出了同时满足两个Desiderata的单纯形式：i）学习以单个条件网络为代表的战略性不同的基础政策； ii）使用同一网络，通过基础策略的单纯形式学习最佳反应。我们表明，由此产生的条件策略有效地包含了有关对手的先前信息，从而在具有可拖动的最佳回答的游戏中，对随意的混合策略有了几乎最佳的回报。我们验证此类政策在不确定性下表现出色，并在测试时使用这种灵活性提供了见解。最后，我们提供的证据表明，学习任何混合政策的最佳响应是战略探索的有效辅助任务，这本身可以导致更多的绩效人群。

Learning to play optimally against any mixture over a diverse set of strategies is of important practical interests in competitive games. In this paper, we propose simplex-NeuPL that satisfies two desiderata simultaneously: i) learning a population of strategically diverse basis policies, represented by a single conditional network; ii) using the same network, learn best-responses to any mixture over the simplex of basis policies. We show that the resulting conditional policies incorporate prior information about their opponents effectively, enabling near optimal returns against arbitrary mixture policies in a game with tractable best-responses. We verify that such policies behave Bayes-optimally under uncertainty and offer insights in using this flexibility at test time. Finally, we offer evidence that learning best-responses to any mixture policies is an effective auxiliary task for strategic exploration, which, by itself, can lead to more performant populations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题