论文标题
通过猜想的在线lookahead适应在非组织环境中的自适应驾驶
Self-Adaptive Driving in Nonstationary Environments through Conjectural Online Lookahead Adaptation
论文作者
论文摘要
强化学习(RL)由深层代表性学习提供支持,提供了一个端到端学习框架,能够解决无需手动设计即可解决自动驾驶(SD)任务。但是,随时间变化的非组织环境会导致熟练但专业的RL政策在执行时间失败。例如,在晴天训练的基于RL的SD政策并不能很好地推广到多雨的天气。即使元学习使RL代理能够适应新的任务/环境,但其离线操作在面对非机构环境时仍无法为代理提供在线适应能力。这项工作提出了一种基于\ emph {猜想的在线lookahead改编}(COLA)的在线元强化学习算法。可乐通过在LookAhead Horizon中最大化代理商对未来表现的猜想来确定在线改编。实验结果表明,在动态变化的天气和照明条件下,基于Cola的自适应驱动驱动器的表现优于基线政策,而在线适应性方面。 {\ tt https://github.com/panshark/cola}可用演示视频,源代码和附录
Powered by deep representation learning, reinforcement learning (RL) provides an end-to-end learning framework capable of solving self-driving (SD) tasks without manual designs. However, time-varying nonstationary environments cause proficient but specialized RL policies to fail at execution time. For example, an RL-based SD policy trained under sunny days does not generalize well to rainy weather. Even though meta learning enables the RL agent to adapt to new tasks/environments, its offline operation fails to equip the agent with online adaptation ability when facing nonstationary environments. This work proposes an online meta reinforcement learning algorithm based on the \emph{conjectural online lookahead adaptation} (COLA). COLA determines the online adaptation at every step by maximizing the agent's conjecture of the future performance in a lookahead horizon. Experimental results demonstrate that under dynamically changing weather and lighting conditions, the COLA-based self-adaptive driving outperforms the baseline policies in terms of online adaptability. A demo video, source code, and appendixes are available at {\tt https://github.com/Panshark/COLA}