思考太快或太慢：计划和加强学习之间的计算权衡

论文标题

思考太快或太慢：计划和加强学习之间的计算权衡

Think Too Fast Nor Too Slow: The Computational Trade-off Between Planning And Reinforcement Learning

论文作者

Moerland, Thomas M., Deichler, Anna, Baldi, Simone, Broekens, Joost, Jonker, Catholijn M.

论文摘要

计划和强化学习是顺序决策的两种关键方法。多步近似实时动态编程，这是一个最近成功的算法类别的Alphazero [Silver等，2018]就是一个示例，通过在学习循环中嵌套计划来结合两者。但是，计划和学习的结合引入了一个新问题：我们应该如何平衡计划，学习和行动的时间？以前没有明确研究这种权衡的重要性。我们表明，这实际上至关重要，计算结果表明我们既不计划太长也不太短。从概念上讲，我们确定了一系列新的计划学习算法，范围从详尽的搜索（长期计划）到无模型的RL（无计划），并且中途达到了最佳性能。

Planning and reinforcement learning are two key approaches to sequential decision making. Multi-step approximate real-time dynamic programming, a recently successful algorithm class of which AlphaZero [Silver et al., 2018] is an example, combines both by nesting planning within a learning loop. However, the combination of planning and learning introduces a new question: how should we balance time spend on planning, learning and acting? The importance of this trade-off has not been explicitly studied before. We show that it is actually of key importance, with computational results indicating that we should neither plan too long nor too short. Conceptually, we identify a new spectrum of planning-learning algorithms which ranges from exhaustive search (long planning) to model-free RL (no planning), with optimal performance achieved midway.

下载PDF全文

下载文献需遵守相关版权规定

论文标题