论文标题
微型自动驾驶赛车的SIM到现实转移
Sim-To-Real Transfer for Miniature Autonomous Car Racing
论文作者
论文摘要
SIM到真实的术语是描述模型在模拟器中训练的地方,然后转移到现实世界中,这是一种能够更快的深度强化学习(DRL)培训的技术。但是,模拟器与现实世界之间的差异通常会导致模型在现实世界中的表现不佳。域随机化是一种通过将模型暴露于各种场景的方式来弥合SIM到空隙间隙的一种方式,以便它可以推广到现实世界中的情况。但是,遵循域随机化以训练具有DRL的自动驾驶赛车模型可能会导致不良结果。也就是说,经过随机化训练的模型往往运行较慢。在测试轨道上的完成率更高的是,以更长的单圈时间为代价。本文旨在提高受过训练的赛车模型的鲁棒性,而不会损害赛车圈。对于具有相同形状(和相同最佳路径)的训练轨道和测试轨道,但使用不同的照明,背景等,我们首先训练模型(教师模型),该模型(教师模型)过度拟合了训练轨道,并沿着几乎最佳的路径移动。然后,我们使用此模型来教授学生模型以及随机化的正确动作。借助我们的方法,测试轨道上具有18.4 \%完成率的模型能够帮助教授52 \%完成的学生模型。此外,在平均50次试验中,学生能够比老师快0.23秒。在紧密的比赛中,这个0.23秒的间隙显着,圈速约为10到12秒。
Sim-to-real, a term that describes where a model is trained in a simulator then transferred to the real world, is a technique that enables faster deep reinforcement learning (DRL) training. However, differences between the simulator and the real world often cause the model to perform poorly in the real world. Domain randomization is a way to bridge the sim-to-real gap by exposing the model to a wide range of scenarios so that it can generalize to real-world situations. However, following domain randomization to train an autonomous car racing model with DRL can lead to undesirable outcomes. Namely, a model trained with randomization tends to run slower; a higher completion rate on the testing track comes at the expense of longer lap times. This paper aims to boost the robustness of a trained race car model without compromising racing lap times. For a training track and a testing track having the same shape (and same optimal paths), but with different lighting, background, etc., we first train a model (teacher model) that overfits the training track, moving along a near optimal path. We then use this model to teach a student model the correct actions along with randomization. With our method, a model with 18.4\% completion rate on the testing track is able to help teach a student model with 52\% completion. Moreover, over an average of 50 trials, the student is able to finish a lap 0.23 seconds faster than the teacher. This 0.23 second gap is significant in tight races, with lap times of about 10 to 12 seconds.