对非平稳强化学习的适应

论文标题

对非平稳强化学习的适应

Factored Adaptation for Non-Stationary Reinforcement Learning

论文作者

Feng, Fan, Huang, Biwei, Zhang, Kun, Magliacane, Sara

论文摘要

在环境中（例如，在过渡动力学）和目标（例如，在奖励功能中）处理非平稳性是一个具有挑战性的问题，在增强学习（RL）的现实世界应用中至关重要。尽管大多数当前方法将变化模拟为单个共享嵌入向量，但我们利用了从最近的因果关系文献到单个潜在变化因素和不同环境中的因果图来建模非平稳性的见解。特别是，我们提出了对非平稳RL（FANS-RL）的分类适应性，这是一种偏向的适应方法，该方法以偏向的MDP的形式共同学习了因果结构，以及对个体时间变化因素的分类表示。我们证明，在标准假设下，我们可以完全恢复代表分类过渡和奖励函数的因果图，以及个体变化因素和状态组件之间的部分结构。通过我们的一般框架，我们可以考虑具有不同功能类型和变化频率的一般非平稳场景，包括跨情节和情节内的变化。实验结果表明，在回报，潜在状态表示的紧凑性以及在不同程度的非平稳性方面，FANS-RL的表现优于现有方法。

Dealing with non-stationarity in environments (e.g., in the transition dynamics) and objectives (e.g., in the reward functions) is a challenging problem that is crucial in real-world applications of reinforcement learning (RL). While most current approaches model the changes as a single shared embedding vector, we leverage insights from the recent causality literature to model non-stationarity in terms of individual latent change factors, and causal graphs across different environments. In particular, we propose Factored Adaptation for Non-Stationary RL (FANS-RL), a factored adaption approach that learns jointly both the causal structure in terms of a factored MDP, and a factored representation of the individual time-varying change factors. We prove that under standard assumptions, we can completely recover the causal graph representing the factored transition and reward function, as well as a partial structure between the individual change factors and the state components. Through our general framework, we can consider general non-stationary scenarios with different function types and changing frequency, including changes across episodes and within episodes. Experimental results demonstrate that FANS-RL outperforms existing approaches in terms of return, compactness of the latent state representation, and robustness to varying degrees of non-stationarity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题