通过蒙特卡洛评论家优化在加固学习中进行的指导探索

论文标题

通过蒙特卡洛评论家优化在加固学习中进行的指导探索

Guided Exploration in Reinforcement Learning via Monte Carlo Critic Optimization

论文作者

Kuznetsov, Igor

论文摘要

深层确定性的非政策算法的类别有效地用于解决具有挑战性的连续控制问题。当前的方法通常利用随机噪声作为一种探索方法，它具有几个缺点，包括需要对给定任务进行手动调整以及在训练过程中没有探索性校准。我们通过提出一种新型的指导探索方法来应对这些挑战，该方法使用蒙特卡洛批评家合奏来计算探索性行动校正。提出的方法通过动态调整勘探来增强传统探索方案。随后，我们提出了一种新颖的算法，该算法利用拟议的探索模块进行政策和评论家修改。与现代加强学习算法相比，所提出的算法表现出卓越的性能。

The class of deep deterministic off-policy algorithms is effectively applied to solve challenging continuous control problems. Current approaches commonly utilize random noise as an exploration method, which has several drawbacks, including the need for manual adjustment for a given task and the absence of exploratory calibration during the training process. We address these challenges by proposing a novel guided exploration method that uses an ensemble of Monte Carlo Critics for calculating exploratory action correction. The proposed method enhances the traditional exploration scheme by dynamically adjusting exploration. Subsequently, we present a novel algorithm that leverages the proposed exploratory module for both policy and critic modification. The presented algorithm demonstrates superior performance compared to modern reinforcement learning algorithms across a variety of problems in the DMControl suite.

下载PDF全文

下载文献需遵守相关版权规定

论文标题