遵循未知因果依赖性的时间逻辑指令的框架

论文标题

遵循未知因果依赖性的时间逻辑指令的框架

A Framework for Following Temporal Logic Instructions with Unknown Causal Dependencies

论文作者

Xu, Duo, Fekri, Faramarz

论文摘要

教深入的强化学习（RL）代理在多任务环境中遵循指示是一个挑战性的问题。我们认为用户通过线性时间逻辑（LTL）公式定义了每个任务。但是，用户可能未知的复杂环境中的某些因果关系依赖性未知。因此，当人类用户指定说明时，机器人无法通过简单地按照给定的说明来解决任务。在这项工作中，我们提出了一个分层增强学习（HRL）框架，其中学会了一个符号过渡模型，以有效地制定高级计划，以指导代理有效地解决不同的任务。具体而言，符号过渡模型是通过归纳逻辑编程（ILP）来捕获状态过渡的逻辑规则的。通过计划符号过渡模型的乘积和从LTL公式得出的自动机的乘积，代理可以解决因果关系依赖性，并将因果复杂问题分解为一系列简单的低级子任务。我们在离散域和连续域中的三个环境上评估了所提出的框架，显示了比以前的代表性方法的优势。

Teaching a deep reinforcement learning (RL) agent to follow instructions in multi-task environments is a challenging problem. We consider that user defines every task by a linear temporal logic (LTL) formula. However, some causal dependencies in complex environments may be unknown to the user in advance. Hence, when human user is specifying instructions, the robot cannot solve the tasks by simply following the given instructions. In this work, we propose a hierarchical reinforcement learning (HRL) framework in which a symbolic transition model is learned to efficiently produce high-level plans that can guide the agent efficiently solve different tasks. Specifically, the symbolic transition model is learned by inductive logic programming (ILP) to capture logic rules of state transitions. By planning over the product of the symbolic transition model and the automaton derived from the LTL formula, the agent can resolve causal dependencies and break a causally complex problem down into a sequence of simpler low-level sub-tasks. We evaluate the proposed framework on three environments in both discrete and continuous domains, showing advantages over previous representative methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题