学习通过Starcraft II中的自动课程学习从单个人类演示中指导多个异质演示者

论文标题

学习通过Starcraft II中的自动课程学习从单个人类演示中指导多个异质演示者

Learning to Guide Multiple Heterogeneous Actors from a Single Human Demonstration via Automatic Curriculum Learning in StarCraft II

论文作者

Waytowich, Nicholas, Hare, James, Goecks, Vinicius G., Mittrick, Mark, Richardson, John, Basak, Anjon, Asher, Derrik E.

论文摘要

传统上，鉴于该算法可以访问大量的高质量数据，从而涵盖了代理商运行时遇到的最有可能的情况，因此传统上，通过直接行为克隆从人类示范中学习可能会导致高性能政策。但是，在实际情况下，专家数据是有限的，希望训练一个能够学习一项行为政策的代理商，足以处理人类专家未证明的情况。另一种选择是通过深入的强化学习学习这些政策，但是，这些算法需要大量计算时间才能在具有高维状态和动作空间的复杂任务上执行良好的计算时间，例如在Starcraft II中发现的空间。自动课程学习是一种最新的机制，该机制由技术组成，旨在通过根据代理当前的功能调整要解决的当前任务的难度来加快深度强化学习。但是，设计适当的课程对于足够复杂的任务可能具有挑战性，因此我们利用人类的示范作为指导训练过程中探索的一种方式。在这项工作中，我们旨在训练可以指挥多个异质参与者的深入加强学习者，在这些学习者中，从单个人类演示中的自动生成的课程控制了任务的起始职位和整体困难。我们的结果表明，通过自动课程学习训练的代理商可以超越最先进的深度强化学习基线，并在以真实的军事场景为模型的Starcraft II中的模拟命令和控制任务中与人类专家的性能相匹配。

Traditionally, learning from human demonstrations via direct behavior cloning can lead to high-performance policies given that the algorithm has access to large amounts of high-quality data covering the most likely scenarios to be encountered when the agent is operating. However, in real-world scenarios, expert data is limited and it is desired to train an agent that learns a behavior policy general enough to handle situations that were not demonstrated by the human expert. Another alternative is to learn these policies with no supervision via deep reinforcement learning, however, these algorithms require a large amount of computing time to perform well on complex tasks with high-dimensional state and action spaces, such as those found in StarCraft II. Automatic curriculum learning is a recent mechanism comprised of techniques designed to speed up deep reinforcement learning by adjusting the difficulty of the current task to be solved according to the agent's current capabilities. Designing a proper curriculum, however, can be challenging for sufficiently complex tasks, and thus we leverage human demonstrations as a way to guide agent exploration during training. In this work, we aim to train deep reinforcement learning agents that can command multiple heterogeneous actors where starting positions and overall difficulty of the task are controlled by an automatically-generated curriculum from a single human demonstration. Our results show that an agent trained via automated curriculum learning can outperform state-of-the-art deep reinforcement learning baselines and match the performance of the human expert in a simulated command and control task in StarCraft II modeled over a real military scenario.

下载PDF全文

下载文献需遵守相关版权规定

论文标题