论文标题
MSVIPER:改进基于加强学习的机器人导航的政策蒸馏
MSVIPER: Improved Policy Distillation for Reinforcement-Learning-Based Robot Navigation
论文作者
论文摘要
我们通过策略提取(MSVIPER)提出了多种可验证的强化学习,这是一种策略蒸馏到决策树以改进机器人导航的新方法。 MSVIPER使用任何强化学习(RL)技术学习了“专家”策略,涉及学习国家行动映射,然后使用模仿学习来从中学习决策树策略。我们证明MSVIPER会导致有效的决策树,并可以准确模仿专家政策的行为。此外,我们提出有效的政策蒸馏和树修改技术,利用决策树结构,可以改进政策而无需重新培训。我们使用我们的方法来改善用于室内和室外场景的基于RL的机器人导航算法的性能。我们证明了在减少冻结和振荡行为(减少95 \%降低)方面,用于在动态障碍物之间导航的移动机器人,振动和振动降低(最高17 \%)(最高17 \%),以在复杂的,不相差的地形上导航。
We present Multiple Scenario Verifiable Reinforcement Learning via Policy Extraction (MSVIPER), a new method for policy distillation to decision trees for improved robot navigation. MSVIPER learns an "expert" policy using any Reinforcement Learning (RL) technique involving learning a state-action mapping and then uses imitation learning to learn a decision-tree policy from it. We demonstrate that MSVIPER results in efficient decision trees and can accurately mimic the behavior of the expert policy. Moreover, we present efficient policy distillation and tree-modification techniques that take advantage of the decision tree structure to allow improvements to a policy without retraining. We use our approach to improve the performance of RL-based robot navigation algorithms for indoor and outdoor scenes. We demonstrate the benefits in terms of reduced freezing and oscillation behaviors (by up to 95\% reduction) for mobile robots navigating among dynamic obstacles and reduced vibrations and oscillation (by up to 17\%) for outdoor robot navigation on complex, uneven terrains.