论文标题
特权信息在加固学习中辍学
Privileged Information Dropout in Reinforcement Learning
论文作者
论文摘要
在培训期间使用特权信息可以提高机器学习系统的样本效率和性能。该范式已用于增强学习(RL),主要是以蒸馏或辅助任务的形式,较少以增强代理输入的形式。在这项工作中,我们调查了特权信息辍学(\ pid),以实现后者,该信息可以同样应用于基于价值和基于策略的RL算法。在一个简单的部分观察到的环境中,我们证明\ pid优于利用特权信息(包括蒸馏和辅助任务)的替代方案,并且可以成功利用不同类型的特权信息。最后,我们分析了其对学习表示的影响。
Using privileged information during training can improve the sample efficiency and performance of machine learning systems. This paradigm has been applied to reinforcement learning (RL), primarily in the form of distillation or auxiliary tasks, and less commonly in the form of augmenting the inputs of agents. In this work, we investigate Privileged Information Dropout (\pid) for achieving the latter which can be applied equally to value-based and policy-based RL algorithms. Within a simple partially-observed environment, we demonstrate that \pid outperforms alternatives for leveraging privileged information, including distillation and auxiliary tasks, and can successfully utilise different types of privileged information. Finally, we analyse its effect on the learned representations.