论文标题

一种简单的分散跨透明法

A Simple Decentralized Cross-Entropy Method

论文作者

Zhang, Zichen, Jin, Jun, Jagersand, Martin, Luo, Jun, Schuurmans, Dale

论文摘要

跨凝结法(CEM)通常用于计划基于模型的增强学习(MBRL),其中通常仅利用一种集中式方法来基于最高$ K $操作的样品结果来更新采样分布。在本文中,我们表明这种集中式方法使CEM容易受到本地Optima的影响,从而损害了其样本效率。为了解决这个问题,我们提出了通过使用彼此独立运行的CEM实例的合奏,并且每个CEM实例的合奏都对分散的CEM(分散案例)进行了简单但有效的改进,并且每个CEM实例都可以对其自身的采样分布进行局部改进。我们提供理论和经验分析,以证明这种简单的分散方法的有效性。我们从经验上表明,与使用单个高斯分布的经典集中式方法相比,我们的分散室发现了全局最佳最佳的一致性,从而提高了样品效率。此外,我们在MBRL的计划问题中插入了分散性,并在几个连续的控制环境中评估了我们的方法,与基于CEM CEM的MBRL方法(PET和POPLIN)进行了比较。结果表明,通过简单地用我们的分散模块替换经典的CEM模块来提高样本效率,同时仅牺牲合理数量的计算成本。最后,我们进行消融研究以进行更深入的分析。代码可从https://github.com/vincentzhang/decentcem获得

Cross-Entropy Method (CEM) is commonly used for planning in model-based reinforcement learning (MBRL) where a centralized approach is typically utilized to update the sampling distribution based on only the top-$k$ operation's results on samples. In this paper, we show that such a centralized approach makes CEM vulnerable to local optima, thus impairing its sample efficiency. To tackle this issue, we propose Decentralized CEM (DecentCEM), a simple but effective improvement over classical CEM, by using an ensemble of CEM instances running independently from one another, and each performing a local improvement of its own sampling distribution. We provide both theoretical and empirical analysis to demonstrate the effectiveness of this simple decentralized approach. We empirically show that, compared to the classical centralized approach using either a single or even a mixture of Gaussian distributions, our DecentCEM finds the global optimum much more consistently thus improves the sample efficiency. Furthermore, we plug in our DecentCEM in the planning problem of MBRL, and evaluate our approach in several continuous control environments, with comparison to the state-of-art CEM based MBRL approaches (PETS and POPLIN). Results show sample efficiency improvement by simply replacing the classical CEM module with our DecentCEM module, while only sacrificing a reasonable amount of computational cost. Lastly, we conduct ablation studies for more in-depth analysis. Code is available at https://github.com/vincentzhang/decentCEM

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源