用粒子动态环境进行深度加固学习，用于遇到障碍的房间的紧急撤离

论文标题

用粒子动态环境进行深度加固学习，用于遇到障碍的房间的紧急撤离

Deep reinforcement learning with a particle dynamics environment applied to emergency evacuation of a room with obstacles

论文作者

Zhang, Yihao, Chai, Zhaojie, Lykotrafitis, George

论文摘要

模拟紧急撤离的非常成功的模型是社会强制模型。模型的核心是应用于代理的自动驱动力，并针对出口。但是，尚不清楚该力的应用是否导致最佳疏散，尤其是在具有障碍的复杂环境中。在这里，我们开发了一种与社会力量模型相关的深入增强学习算法，以训练代理以找到最快的撤离路径。在培训期间，我们对房间中代理商的每个步骤进行处罚，并在出口处给予零奖励。我们采用DYNA-Q学习方法。我们首先表明，在没有障碍的房间的情况下，所得的自动驱动力点直接与社会力量模型一样，并且使用两种方法计算的中间出口时间间隔没有显着差异。然后，我们调查一个有一个障碍物和一个出口的房间的疏散。我们表明，当障碍物是凸面时，我们的方法与社会力量模型产生了相似的结果。但是，在凹陷障碍的情况下，有时可能充当纯粹由社会力量模型支配并禁止完整房间疏散的特工的陷阱，我们的方法显然是有利的，因为它得出了一项政策，从而导致避免物体的避免物体和完整的房间疏散而没有其他假设。我们还研究了一个有多个出口的房间的疏散。我们表明，代理可以通过对单个代理的训练的共享网络有效地从最近的出口撤离。最后，我们在具有多个出口和障碍的复杂环境中测试了DYNA-Q学习方法的鲁棒性。总体而言，我们表明我们的模型可以在具有多个房间出口和障碍物的复杂环境中有效模拟紧急撤离，在这些环境中很难获得快速疏散的直观规则。

A very successful model for simulating emergency evacuation is the social-force model. At the heart of the model is the self-driven force that is applied to an agent and is directed towards the exit. However, it is not clear if the application of this force results in optimal evacuation, especially in complex environments with obstacles. Here, we develop a deep reinforcement learning algorithm in association with the social force model to train agents to find the fastest evacuation path. During training, we penalize every step of an agent in the room and give zero reward at the exit. We adopt the Dyna-Q learning approach. We first show that in the case of a room without obstacles the resulting self-driven force points directly towards the exit as in the social force model and that the median exit time intervals calculated using the two methods are not significantly different. Then, we investigate evacuation of a room with one obstacle and one exit. We show that our method produces similar results with the social force model when the obstacle is convex. However, in the case of concave obstacles, which sometimes can act as traps for agents governed purely by the social force model and prohibit complete room evacuation, our approach is clearly advantageous since it derives a policy that results in object avoidance and complete room evacuation without additional assumptions. We also study evacuation of a room with multiple exits. We show that agents are able to evacuate efficiently from the nearest exit through a shared network trained for a single agent. Finally, we test the robustness of the Dyna-Q learning approach in a complex environment with multiple exits and obstacles. Overall, we show that our model can efficiently simulate emergency evacuation in complex environments with multiple room exits and obstacles where it is difficult to obtain an intuitive rule for fast evacuation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题