作为一般价值函数的负担：计算模型

论文标题

作为一般价值函数的负担：计算模型

Affordance as general value function: A computational model

论文作者

Graves, Daniel, Günther, Johannes, Luo, Jun

论文摘要

加固学习（RL）文献中的一般价值函数（GVF）是遵循环境中特定策略的代理结果的长期预测性摘要。可以将特定价值的行动可能性视为预测的政策与政策相关的善良，并以GVF的形式塑造。对此连接的系统解释表明，GVF及其深度学习实施方案（1）将负担能力预测视为直接感知的一种形式，（2）阐明了负担能力中的动作与感知之间的基本联系，（3）提供了一种可扩展的方式来使用RL方法来学习。通过对机器人技术中有关GVF应用程序和代表性负担研究的现有文献的广泛审查，我们证明了GVF为在现实世界应用中提供的学习提供了正确的框架。此外，我们重点介绍了一些新的研究途径，从“负担为GVF”的角度开始，包括使用GVF来协调复杂的行为。

General value functions (GVFs) in the reinforcement learning (RL) literature are long-term predictive summaries of the outcomes of agents following specific policies in the environment. Affordances as perceived action possibilities with specific valence may be cast into predicted policy-relative goodness and modelled as GVFs. A systematic explication of this connection shows that GVFs and especially their deep learning embodiments (1) realize affordance prediction as a form of direct perception, (2) illuminate the fundamental connection between action and perception in affordance, and (3) offer a scalable way to learn affordances using RL methods. Through an extensive review of existing literature on GVF applications and representative affordance research in robotics, we demonstrate that GVFs provide the right framework for learning affordances in real-world applications. In addition, we highlight a few new avenues of research opened up by the perspective of "affordance as GVF", including using GVFs for orchestrating complex behaviors.

下载PDF全文

下载文献需遵守相关版权规定

论文标题