通过猜想的在线lookahead适应在非组织环境中的自适应驾驶

论文标题

通过猜想的在线lookahead适应在非组织环境中的自适应驾驶

Self-Adaptive Driving in Nonstationary Environments through Conjectural Online Lookahead Adaptation

论文作者

Li, Tao, Lei, Haozhe, Zhu, Quanyan

论文摘要

强化学习（RL）由深层代表性学习提供支持，提供了一个端到端学习框架，能够解决无需手动设计即可解决自动驾驶（SD）任务。但是，随时间变化的非组织环境会导致熟练但专业的RL政策在执行时间失败。例如，在晴天训练的基于RL的SD政策并不能很好地推广到多雨的天气。即使元学习使RL代理能够适应新的任务/环境，但其离线操作在面对非机构环境时仍无法为代理提供在线适应能力。这项工作提出了一种基于\ emph {猜想的在线lookahead改编}（COLA）的在线元强化学习算法。可乐通过在LookAhead Horizon中最大化代理商对未来表现的猜想来确定在线改编。实验结果表明，在动态变化的天气和照明条件下，基于Cola的自适应驱动驱动器的表现优于基线政策，而在线适应性方面。 {\ tt https://github.com/panshark/cola}可用演示视频，源代码和附录

Powered by deep representation learning, reinforcement learning (RL) provides an end-to-end learning framework capable of solving self-driving (SD) tasks without manual designs. However, time-varying nonstationary environments cause proficient but specialized RL policies to fail at execution time. For example, an RL-based SD policy trained under sunny days does not generalize well to rainy weather. Even though meta learning enables the RL agent to adapt to new tasks/environments, its offline operation fails to equip the agent with online adaptation ability when facing nonstationary environments. This work proposes an online meta reinforcement learning algorithm based on the \emph{conjectural online lookahead adaptation} (COLA). COLA determines the online adaptation at every step by maximizing the agent's conjecture of the future performance in a lookahead horizon. Experimental results demonstrate that under dynamically changing weather and lighting conditions, the COLA-based self-adaptive driving outperforms the baseline policies in terms of online adaptability. A demo video, source code, and appendixes are available at {\tt https://github.com/Panshark/COLA}

下载PDF全文

下载文献需遵守相关版权规定

论文标题