通过可区分功能近似的离线增强学习效率很高

论文标题

通过可区分功能近似的离线增强学习效率很高

Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient

论文作者

Yin, Ming, Wang, Mengdi, Wang, Yu-Xiang

论文摘要

离线增强学习旨在通过历史数据优化顺序决策策略，已广泛应用于现实生活中。最先进的算法通常利用强大的功能近似器（例如神经网络）来减轻样品复杂性障碍，以获得更好的经验性能。尽管取得了成功，但仍然缺乏对功能近似统计复杂性的更系统的理解。为了弥合差距，我们通过考虑使用可区分功能类近似（DFA）的离线增强学习迈出了一步。该功能类自然结合了与非线性/非凸结构的广泛模型。最重要的是，我们通过分析悲观拟合的Q-学习（PFQL）算法来证明具有可区分函数近似值的离线RL可以有效，并且我们的结果为理解依赖于拟合Q介质样式设计的各种实践启发式学提供了理论基础。此外，我们通过更严格的实例依赖性表征进一步改善了我们的保证。我们希望我们的工作能够吸引对研究强化学习的兴趣，并具有可区分功能近似值，而不是当前研究的范围。

Offline reinforcement learning, which aims at optimizing sequential decision-making strategies with historical data, has been extensively applied in real-life applications. State-Of-The-Art algorithms usually leverage powerful function approximators (e.g. neural networks) to alleviate the sample complexity hurdle for better empirical performances. Despite the successes, a more systematic understanding of the statistical complexity for function approximation remains lacking. Towards bridging the gap, we take a step by considering offline reinforcement learning with differentiable function class approximation (DFA). This function class naturally incorporates a wide range of models with nonlinear/nonconvex structures. Most importantly, we show offline RL with differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning (PFQL) algorithm, and our results provide the theoretical basis for understanding a variety of practical heuristics that rely on Fitted Q-Iteration style design. In addition, we further improve our guarantee with a tighter instance-dependent characterization. We hope our work could draw interest in studying reinforcement learning with differentiable function approximation beyond the scope of current research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题