将明确的不确定性估计纳入深层的脱机增强学习

论文标题

将明确的不确定性估计纳入深层的脱机增强学习

Incorporating Explicit Uncertainty Estimates into Deep Offline Reinforcement Learning

论文作者

Brandfonbrener, David, Combes, Remi Tachet des, Laroche, Romain

论文摘要

在离线增强学习设置中，大多数理论上动机的工作需要精确的不确定性估计。这项要求限制了该作品中得出的算法到存在此类估计值的表格和线性设置。在这项工作中，我们开发了一种新颖的方法，将可扩展的不确定性估计值纳入一个称为DeepSPIBB的离线增强学习算法中，该算法将SPIBB算法系列扩展到具有较大状态和动作空间的环境。我们在深度学习社区的不确定性估计中使用了最新的创新，以获取更多可扩展的不确定性估计，以插入深度SPIBB。尽管这些不确定性估计不允许与表格情况相同的理论保证，但我们认为，与悲观方法相比，用于纳入不确定性的SPIBB机制更强大，更灵活，该方法将不确定性作为价值函数惩罚纳入了不确定性。我们以经验的方式承担这一点，表明深层spibb的表现优于基于悲观的方法，可以访问相同的不确定性估计，并且至少在几种环境和数据集中与其他各种强大的基线相提并论。

Most theoretically motivated work in the offline reinforcement learning setting requires precise uncertainty estimates. This requirement restricts the algorithms derived in that work to the tabular and linear settings where such estimates exist. In this work, we develop a novel method for incorporating scalable uncertainty estimates into an offline reinforcement learning algorithm called deep-SPIBB that extends the SPIBB family of algorithms to environments with larger state and action spaces. We use recent innovations in uncertainty estimation from the deep learning community to get more scalable uncertainty estimates to plug into deep-SPIBB. While these uncertainty estimates do not allow for the same theoretical guarantees as in the tabular case, we argue that the SPIBB mechanism for incorporating uncertainty is more robust and flexible than pessimistic approaches that incorporate the uncertainty as a value function penalty. We bear this out empirically, showing that deep-SPIBB outperforms pessimism based approaches with access to the same uncertainty estimates and performs at least on par with a variety of other strong baselines across several environments and datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题