论文标题
正规化解决广泛的游戏的力量
The Power of Regularization in Solving Extensive-Form Games
论文作者
论文摘要
在本文中,我们研究了{\ it正则化}的功能,这是一种在解决广泛形式游戏(EFGS)中的强化学习和优化方面的常见技术。我们提出了一系列新算法,基于规范游戏的收益功能,并建立一组收敛结果,这些结果严格改善了现有的假设或更强的收敛保证。 In particular, we first show that dilated optimistic mirror descent (DOMD), an efficient variant of OMD for solving EFGs, with adaptive regularization can achieve a fast $\tilde O(1/T)$ {last-iterate convergence rate for the output of the algorithm} in terms of duality gap and distance to the set of Nash equilibrium (NE) without uniqueness assumption of the NE. Second, we show that regularized counterfactual regret minimization (\texttt{Reg-CFR}), with a variant of optimistic mirror descent algorithm as regret-minimizer, can achieve $O(1/T^{1/4})$ best-iterate, and $O(1/T^{3/4})$ average-iterate convergence rate for finding NE in EFGs.最后,我们表明\ texttt {reg-cfr}可以实现渐近的最后介质收敛,而最佳$ o(1/t)$平均识别收敛率,用于查找扰动的EFGS的NE,这对于查找近似广泛形式的完美平衡(EFPE)很有用。据我们所知,它们构成了CFR型算法的第一个最后近期收敛结果,同时匹配了最先进的平均识别收敛速率,以寻找非扰动的EFGS的NE。我们还提供数值结果来证实我们算法的优势。
In this paper, we investigate the power of {\it regularization}, a common technique in reinforcement learning and optimization, in solving extensive-form games (EFGs). We propose a series of new algorithms based on regularizing the payoff functions of the game, and establish a set of convergence results that strictly improve over the existing ones, with either weaker assumptions or stronger convergence guarantees. In particular, we first show that dilated optimistic mirror descent (DOMD), an efficient variant of OMD for solving EFGs, with adaptive regularization can achieve a fast $\tilde O(1/T)$ {last-iterate convergence rate for the output of the algorithm} in terms of duality gap and distance to the set of Nash equilibrium (NE) without uniqueness assumption of the NE. Second, we show that regularized counterfactual regret minimization (\texttt{Reg-CFR}), with a variant of optimistic mirror descent algorithm as regret-minimizer, can achieve $O(1/T^{1/4})$ best-iterate, and $O(1/T^{3/4})$ average-iterate convergence rate for finding NE in EFGs. Finally, we show that \texttt{Reg-CFR} can achieve asymptotic last-iterate convergence, and optimal $O(1/T)$ average-iterate convergence rate, for finding the NE of perturbed EFGs, which is useful for finding approximate extensive-form perfect equilibria (EFPE). To the best of our knowledge, they constitute the first last-iterate convergence results for CFR-type algorithms, while matching the state-of-the-art average-iterate convergence rate in finding NE for non-perturbed EFGs. We also provide numerical results to corroborate the advantages of our algorithms.