顺序决策问题中的时变参数

论文标题

顺序决策问题中的时变参数

Time-Varying Parameters in Sequential Decision Making Problems

论文作者

Srivastava, Amber, Salapaka, S. M.

论文摘要

在本文中，我们解决了以时变参数为特征的顺序决策制定（SDM）问题。这些参数动力学是预先指定的或可以操纵的。在任何给定的时间，决定顺序决策的决策政策以及所有参数值都决定了基础SDM产生的累积成本。因此，目的是确定可操作的参数动力学以及随时间变化的决策政策，以便每次瞬间将相关的成本最小化。为此，我们开发了一个控制理论框架来设计未知参数动力学，以便它找到和跟踪参数的最佳值，并同时确定随时间变化的最佳顺序决策策略。我们的方法基于基于最大的熵原理（MEP）框架，该框架解决了静态参数化的SDM。更确切地说，我们利用累积成本的结果平滑近似（从上述框架）作为对照lyapunov函数。我们表明，在由此产生的控制法下，参数渐近地跟踪局部最佳，拟议的控制定律是Lipschitz的连续和界限，并确保SDM的决策策略对于给定的一组参数值是最佳的。模拟证明了我们提出的方法的功效。

In this paper we address the class of Sequential Decision Making (SDM) problems that are characterized by time-varying parameters. These parameter dynamics are either pre-specified or manipulable. At any given time instant the decision policy -- that governs the sequential decisions -- along with all the parameter values determines the cumulative cost incurred by the underlying SDM. Thus, the objective is to determine the manipulable parameter dynamics as well as the time-varying decision policy such that the associated cost gets minimized at each time instant. To this end we develop a control-theoretic framework to design the unknown parameter dynamics such that it locates and tracks the optimal values of the parameters, and simultaneously determines the time-varying optimal sequential decision policy. Our methodology builds upon a Maximum Entropy Principle (MEP) based framework that addresses the static parameterized SDMs. More precisely, we utilize the resulting smooth approximation (from the above framework) of the cumulative cost as a control Lyapunov function. We show that under the resulting control law the parameters asymptotically track the local optimal, the proposed control law is Lipschitz continuous and bounded, as well as ensure that the decision policy of the SDM is optimal for a given set of parameter values. The simulations demonstrate the efficacy of our proposed methodology.

下载PDF全文

下载文献需遵守相关版权规定

论文标题