Primal2：通过增强和模仿多学位学习进行探路 - 终身

论文标题

Primal2：通过增强和模仿多学位学习进行探路 - 终身

PRIMAL2: Pathfinding via Reinforcement and Imitation Multi-Agent Learning -- Lifelong

论文作者

Damani, Mehul, Luo, Zhiyao, Wenzel, Emerson, Sartoretti, Guillaume

论文摘要

多代理路径查找（MAPF）是从机场管理到仓库自动化的许多域中大规模机器人部署的必不可少的组件。特别是，这项工作解决了终身MAPF（LMAPF） - 问题的在线变体，在该变体中，代理在达到当前目标时立即被分配了一个新目标 - 在典型的现实世界仓库运营的典型和高度结构化的环境中。有效地在这种环境中解决LMAPF需要代理之间昂贵的协调以及频繁的重新培养能力，这是现有的耦合和脱钩方法的艰巨任务。为了实现相当大的代理协调，而无需对反应性和可伸缩性进行任何妥协，我们引入了Primal2，Primal2是LMAPF的分布式增强学习框架，在该框架中，代理人学习完全分散的政策以在可观察到的部分可观察到的世界中在线进行反应计划路径。我们将以前的工作扩展到低密度稀疏的世界中有效的工作，通过识别改善隐式剂协调的行为和约定，将其结构化和约束的世界扩展到高度结构化和约束的世界中，并通过建立新的本地代理观察和各种培训辅助工具来实现他们的学习。我们在MAPF和LMAPF环境中介绍了Primal2的广泛结果，并将其性能与最先进的计划者进行了比较。我们表明，Primal2显着超过了我们以前的工作，并且与这些基线相当，同时允许实时重新计划和扩展高达2048代理。

Multi-agent path finding (MAPF) is an indispensable component of large-scale robot deployments in numerous domains ranging from airport management to warehouse automation. In particular, this work addresses lifelong MAPF (LMAPF) - an online variant of the problem where agents are immediately assigned a new goal upon reaching their current one - in dense and highly structured environments, typical of real-world warehouse operations. Effectively solving LMAPF in such environments requires expensive coordination between agents as well as frequent replanning abilities, a daunting task for existing coupled and decoupled approaches alike. With the purpose of achieving considerable agent coordination without any compromise on reactivity and scalability, we introduce PRIMAL2, a distributed reinforcement learning framework for LMAPF where agents learn fully decentralized policies to reactively plan paths online in a partially observable world. We extend our previous work, which was effective in low-density sparsely occupied worlds, to highly structured and constrained worlds by identifying behaviors and conventions which improve implicit agent coordination, and enable their learning through the construction of a novel local agent observation and various training aids. We present extensive results of PRIMAL2 in both MAPF and LMAPF environments and compare its performance to state-of-the-art planners in terms of makespan and throughput. We show that PRIMAL2 significantly surpasses our previous work and performs comparably to these baselines, while allowing real-time re-planning and scaling up to 2048 agents.

下载PDF全文

下载文献需遵守相关版权规定

论文标题