信息放松和随机动态程序的双重性驱动算法

论文标题

信息放松和随机动态程序的双重性驱动算法

Information Relaxation and A Duality-Driven Algorithm for Stochastic Dynamic Programs

论文作者

Chen, Nan, Ma, Xiang, Liu, Yanchu, Yu, Wei

论文摘要

我们使用信息放松技术来开发二元性驱动的迭代方法，以获取和改善有限 - 摩尼斯随机动态编程问题的真实价值的置信区间估计。我们表明，在有限数量的双迭代中，从所提出的方法中得出的双值估计序列是单调收敛到真实值函数的。为了克服各种应用中的维度诅咒，我们还引入了基于回归的蒙特卡洛算法进行实施。新方法不仅可以用来评估启发式政策的质量，而且还可以改善它们，如果我们发现它们的双重性差距很大。我们从基础函数和采样状态的量中获得了蒙特卡洛法的收敛速率。最后，我们证明了我们在市场摩擦的最佳订单执行问题以及在存在损失销售和交货时间的情况下的库存管理问题中的有效性。在文献中，这两个示例众所周知，难以解决最佳性。实验表明，我们的方法可以显着改善文献中建议的启发式方法，并获得具有令人满意的性能保证的新政策。

We use the technique of information relaxation to develop a duality-driven iterative approach to obtaining and improving confidence interval estimates for the true value of finite-horizon stochastic dynamic programming problems. We show that the sequence of dual value estimates yielded from the proposed approach in principle monotonically converges to the true value function in a finite number of dual iterations. Aiming to overcome the curse of dimensionality in various applications, we also introduce a regression-based Monte Carlo algorithm for implementation. The new approach can be used not only to assess the quality of heuristic policies, but also to improve them if we find that their duality gap is large. We obtain the convergence rate of our Monte Carlo method in terms of the amounts of both basis functions and the sampled states. Finally, we demonstrate the effectiveness of our method in an optimal order execution problem with market friction and in an inventory management problem in the presence of lost sale and lead time. Both examples are well known in the literature to be difficult to solve for optimality. The experiments show that our method can significantly improve the heuristics suggested in the literature and obtain new policies with a satisfactory performance guarantee.

下载PDF全文

下载文献需遵守相关版权规定

论文标题