在资源受限的强化学习中有效探索

论文标题

在资源受限的强化学习中有效探索

Efficient Exploration in Resource-Restricted Reinforcement Learning

论文作者

Wang, Zhihai, Pan, Taoxing, Zhou, Qi, Wang, Jie

论文摘要

在加强学习（RL）的许多实际应用中，执行操作需要消耗每集中不可重复的某些类型的资源。典型的应用程序包括具有有限能量的机器人控制和带有可消耗品的视频游戏。在具有不可替代资源的任务中，我们观察到流行的RL方法（例如软演员评论家）的样本效率不佳。主要原因是，他们倾向于快速耗尽资源，因此由于缺乏资源，随后的探索受到严重限制。为了应对这一挑战，我们首先将上述问题正式化为资源受限的强化学习，然后提出一种新颖的资源感知探索奖金（RAEB），以合理地使用资源。 Raeb的一个吸引人的特征是，它可以大大减少不必要的资源消费试验，同时有效地鼓励代理商探索未访问的国家。实验表明，拟议的Raeb在资源限制的强化学习环境中的最先进探索策略显着优于最先进的探索策略，从而提高了样本效率的最高数量级。

In many real-world applications of reinforcement learning (RL), performing actions requires consuming certain types of resources that are non-replenishable in each episode. Typical applications include robotic control with limited energy and video games with consumable items. In tasks with non-replenishable resources, we observe that popular RL methods such as soft actor critic suffer from poor sample efficiency. The major reason is that, they tend to exhaust resources fast and thus the subsequent exploration is severely restricted due to the absence of resources. To address this challenge, we first formalize the aforementioned problem as a resource-restricted reinforcement learning, and then propose a novel resource-aware exploration bonus (RAEB) to make reasonable usage of resources. An appealing feature of RAEB is that, it can significantly reduce unnecessary resource-consuming trials while effectively encouraging the agent to explore unvisited states. Experiments demonstrate that the proposed RAEB significantly outperforms state-of-the-art exploration strategies in resource-restricted reinforcement learning environments, improving the sample efficiency by up to an order of magnitude.

下载PDF全文

下载文献需遵守相关版权规定

论文标题