论文标题
相对分离的内在对照
Relative Variational Intrinsic Control
论文作者
论文摘要
在没有外部奖励的情况下,代理商仍然可以通过在其环境中识别和掌握各种不同技能来学习有用的行为。现有的技能学习方法使用共同的信息目标激励每种技能,使其与其他技能不同。但是,如果不小心来限制技能的多样化方式,则可能会出现多种多样的技能。为了确保有用的技能多样性,我们提出了一个新颖的技能学习目标,相对变异的内在控制(RVIC),这激励学习技能,这些技能在改变代理商与环境的关系方面是可区分的。由此产生的技能瓷砖集合代理商可提供的负担空间。我们定性地分析了多种环境上的技能行为,并展示了RVIC技能比现有方法在分层增强学习中发现的技能更有用。
In the absence of external rewards, agents can still learn useful behaviors by identifying and mastering a set of diverse skills within their environment. Existing skill learning methods use mutual information objectives to incentivize each skill to be diverse and distinguishable from the rest. However, if care is not taken to constrain the ways in which the skills are diverse, trivially diverse skill sets can arise. To ensure useful skill diversity, we propose a novel skill learning objective, Relative Variational Intrinsic Control (RVIC), which incentivizes learning skills that are distinguishable in how they change the agent's relationship to its environment. The resulting set of skills tiles the space of affordances available to the agent. We qualitatively analyze skill behaviors on multiple environments and show how RVIC skills are more useful than skills discovered by existing methods when used in hierarchical reinforcement learning.