控制作为混合推断

论文标题

控制作为混合推断

Control as Hybrid Inference

论文作者

Tschantz, Alexander, Millidge, Beren, Seth, Anil K., Buckley, Christopher L.

论文摘要

增强学习领域可以分为基于模型的和无模型的方法。在这里，我们通过将无模型的策略优化作为摊销的变异推理和基于模型的计划作为迭代变分推断来统一这些方法，并在“控制为混合推理”（CHI）框架内。我们提出了CHI的实现，该实施自然会介导迭代和摊销推断之间的平衡。使用教学实验，我们证明了所提出的算法在学习开始时以基于模型的方式以基于模型的方式运行，然后收集到足够的数据后收集到无模型算法。我们在连续的控制基准上验证算法的可伸缩性，表明它表现优于强大的模型和基于模型的基线。因此，Chi提供了一个原则上的框架，用于利用基于模型的计划的样本效率，同时保留无模型策略优化的渐近性能。

The field of reinforcement learning can be split into model-based and model-free methods. Here, we unify these approaches by casting model-free policy optimisation as amortised variational inference, and model-based planning as iterative variational inference, within a `control as hybrid inference' (CHI) framework. We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference. Using a didactic experiment, we demonstrate that the proposed algorithm operates in a model-based manner at the onset of learning, before converging to a model-free algorithm once sufficient data have been collected. We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines. CHI thus provides a principled framework for harnessing the sample efficiency of model-based planning while retaining the asymptotic performance of model-free policy optimisation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题