通过对比表示学习使线性MDP实用

论文标题

通过对比表示学习使线性MDP实用

Making Linear MDPs Practical via Contrastive Representation Learning

论文作者

Zhang, Tianjun, Ren, Tongzheng, Yang, Mengjiao, Gonzalez, Joseph E., Schuurmans, Dale, Dai, Bo

论文摘要

通常，通过利用低级表示，解决马尔可夫决策过程（MDP）中维度的诅咒。这激发了有关线性MDP的最新理论研究。但是，大多数方法在不切实际的假设下对分解的归一化或在实践中引入未解决的计算挑战。取而代之的是，我们考虑了线性MDP的替代定义，该定义可以自动确保正常化，同时允许通过对比度估计进行有效的表示学习。该框架还承认置信度调整后的索引算法，使面对不确定性时，可以采用有效而有原则的方法来融合乐观或悲观。据我们所知，这为线性MDP提供了第一种实用的表示学习方法，该方法既可以实现强大的理论保证和经验绩效。从理论上讲，我们证明所提出的算法在在线和离线设置中均有效。从经验上讲，我们在几个基准测试中表现出优于现有基于模型的最新模型和无模型算法的性能。

It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations. This motivates much of the recent theoretical study on linear MDPs. However, most approaches require a given representation under unrealistic assumptions about the normalization of the decomposition or introduce unresolved computational challenges in practice. Instead, we consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning via contrastive estimation. The framework also admits confidence-adjusted index algorithms, enabling an efficient and principled approach to incorporating optimism or pessimism in the face of uncertainty. To the best of our knowledge, this provides the first practical representation learning method for linear MDPs that achieves both strong theoretical guarantees and empirical performance. Theoretically, we prove that the proposed algorithm is sample efficient in both the online and offline settings. Empirically, we demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题