张量和矩阵低级值函数近似在增强学习中

论文标题

张量和矩阵低级值函数近似在增强学习中

Tensor and Matrix Low-Rank Value-Function Approximation in Reinforcement Learning

论文作者

Rozada, Sergio, Paternain, Santiago, Marques, Antonio G.

论文摘要

价值功能（VF）近似是增强学习（RL）的核心问题。经典的非参数VF估计受到维数的诅咒。结果，在高维空间中采用了简约的参数模型，其中大多数努力都集中在基于线性和基于神经网络的方法上。不同的是，本文提出了一种简约的非参数方法，我们使用随机的低级算法来以在线和无模型的方式估算VF矩阵。此外，由于VF往往是多维的，因此我们建议用张量（多路阵列）表示代替经典的VF矩阵表示形式，然后使用Parafac分解来设计无线模型张量张量张量低量算法。提出了不同版本的算法，分析其复杂性，并使用标准化的RL环境对其性能进行数值评估。

Value-function (VF) approximation is a central problem in Reinforcement Learning (RL). Classical non-parametric VF estimation suffers from the curse of dimensionality. As a result, parsimonious parametric models have been adopted to approximate VFs in high-dimensional spaces, with most efforts being focused on linear and neural-network-based approaches. Differently, this paper puts forth a a parsimonious non-parametric approach, where we use stochastic low-rank algorithms to estimate the VF matrix in an online and model-free fashion. Furthermore, as VFs tend to be multi-dimensional, we propose replacing the classical VF matrix representation with a tensor (multi-way array) representation and, then, use the PARAFAC decomposition to design an online model-free tensor low-rank algorithm. Different versions of the algorithms are proposed, their complexity is analyzed, and their performance is assessed numerically using standardized RL environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题