贝尔曼，为什么我要信任你？贝尔曼错误是值错误的替代者

论文标题

贝尔曼，为什么我要信任你？贝尔曼错误是值错误的替代者

Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error

论文作者

Fujimoto, Scott, Meger, David, Precup, Doina, Nachum, Ofir, Gu, Shixiang Shane

论文摘要

在这项工作中，我们研究了Bellman方程作为价值预测准确性的替代目标的使用。虽然Bellman方程在所有状态行动对上都是由真实值函数唯一求解的，但我们发现Bellman误差（方程式两侧之间的差异）对于值函数的准确性而言是一个差的代理。特别是，我们表明（1）由于贝尔曼方程的两侧取消，贝尔曼误差的大小仅与与真实价值函数的距离相关，即使考虑所有状态行动对，并且（2）在有限的数据制度中，贝尔曼方程在无限的许多次级次级次级解决方案中都可以完全满足。这意味着可以最大程度地减少Bellman误差，而无需提高价值函数的准确性。我们通过一系列命题，说明性玩具实例和标准基准域中的经验分析来证明这些现象。

In this work, we study the use of the Bellman equation as a surrogate objective for value prediction accuracy. While the Bellman equation is uniquely solved by the true value function over all state-action pairs, we find that the Bellman error (the difference between both sides of the equation) is a poor proxy for the accuracy of the value function. In particular, we show that (1) due to cancellations from both sides of the Bellman equation, the magnitude of the Bellman error is only weakly related to the distance to the true value function, even when considering all state-action pairs, and (2) in the finite data regime, the Bellman equation can be satisfied exactly by infinitely many suboptimal solutions. This means that the Bellman error can be minimized without improving the accuracy of the value function. We demonstrate these phenomena through a series of propositions, illustrative toy examples, and empirical analysis in standard benchmark domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题