解决奖励假设

论文标题

解决奖励假设

Settling the Reward Hypothesis

论文作者

Bowling, Michael, Martin, John D., Abel, David, Dabney, Will

论文摘要

奖励假设表明，“我们按目标和目的的含义都可以很好地认为是最大化接收的标量信号的累积总和的期望值（奖励）。”我们旨在充分解决这一假设。这不会以简单的肯定或驳斥来得出结论，而是完全指定假设所具有的目标和目的的隐含要求。

The reward hypothesis posits that, "all of what we mean by goals and purposes can be well thought of as maximization of the expected value of the cumulative sum of a received scalar signal (reward)." We aim to fully settle this hypothesis. This will not conclude with a simple affirmation or refutation, but rather specify completely the implicit requirements on goals and purposes under which the hypothesis holds.

下载PDF全文

下载文献需遵守相关版权规定

论文标题