学习马尔可夫连锁店和MDP的混合物

论文标题

学习马尔可夫连锁店和MDP的混合物

Learning Mixtures of Markov Chains and MDPs

论文作者

Kausik, Chinmaya, Tan, Kevin, Tewari, Ambuj

论文摘要

我们提出了一种从短轨迹轨迹中学习马尔可夫链和马尔可夫决策过程（MDP）混合物的算法。具体而言，我们的方法通过进行多步骤过程来处理马尔可夫链与可选控制输入的混合物，涉及（1）使用“成对距离估计器”的轨迹的子空间估计步骤，（2）使用“成对距离估计器”以及使用EM算法的细化，（3）模型估计步骤和（4）一个模型估计步骤和（4）实验室的实验步骤，将其进行改进。我们提供端到端的性能保证，其中只需要在状态数量中明确要求轨迹的长度是线性的，并且在混合时间参数中是线性的。实验结果支持了这些保证，在这个保证中，我们在两个MDP的混合物中达到了96.6％的平均准确性，而与随机初始化（73.2％的平均准确度）相比，EM算法的表现优于EM算法。

We present an algorithm for learning mixtures of Markov chains and Markov decision processes (MDPs) from short unlabeled trajectories. Specifically, our method handles mixtures of Markov chains with optional control input by going through a multi-step process, involving (1) a subspace estimation step, (2) spectral clustering of trajectories using "pairwise distance estimators," along with refinement using the EM algorithm, (3) a model estimation step, and (4) a classification step for predicting labels of new trajectories. We provide end-to-end performance guarantees, where we only explicitly require the length of trajectories to be linear in the number of states and the number of trajectories to be linear in a mixing time parameter. Experimental results support these guarantees, where we attain 96.6% average accuracy on a mixture of two MDPs in gridworld, outperforming the EM algorithm with random initialization (73.2% average accuracy).

下载PDF全文

下载文献需遵守相关版权规定

论文标题