论文标题
顺序决策问题的代数优化
Algebraic optimization of sequential decision problems
论文作者
论文摘要
我们研究了有限的马尔可夫决策过程中预期的长期奖励在一组固定随机策略方面的优化。在确定性观察的情况下,也称为状态聚集,问题等同于优化受二次约束的线性目标。我们将此问题的可行集表征为级别矩阵仿射品种的产物的相交。基于此描述,我们获得了优化问题关键点的数量的界限。最后,我们进行了实验,在这些实验中,我们在可行集合的不同边界组件上求解KKT方程或Lagrange方程,并将结果与理论界限以及其他受约束的优化方法进行比较。
We study the optimization of the expected long-term reward in finite partially observable Markov decision processes over the set of stationary stochastic policies. In the case of deterministic observations, also known as state aggregation, the problem is equivalent to optimizing a linear objective subject to quadratic constraints. We characterize the feasible set of this problem as the intersection of a product of affine varieties of rank one matrices and a polytope. Based on this description, we obtain bounds on the number of critical points of the optimization problem. Finally, we conduct experiments in which we solve the KKT equations or the Lagrange equations over different boundary components of the feasible set, and compare the result to the theoretical bounds and to other constrained optimization methods.