与观察史的行为克隆中的模仿代理作斗争

论文标题

与观察史的行为克隆中的模仿代理作斗争

Fighting Copycat Agents in Behavioral Cloning from Observation Histories

论文作者

Wen, Chuan, Lin, Jierui, Darrell, Trevor, Jayaraman, Dinesh, Gao, Yang

论文摘要

模仿学习列车策略以从输入观察到专家选择的行动来映射。在这种情况下，分布转移经常加剧误导专家行为以促进滋扰的效果在观察到的变量之间相关。我们观察到，这种因果混乱的常见实例发生在专家行动随着时间的推移密切相关时的部分观察到的设置中：模仿者学会通过预测专家的先前行动而不是下一个动作来作弊。为了解决这个“模仿问题”，我们提出了一种对抗性方法来学习一种功能表示，该特征表示可以消除有关以前的专家动作滋扰相关的多余信息，同时保留了预测下一个动作所需的信息。在我们的实验中，我们的方法在各种部分观察到的模仿学习任务中大大提高了性能。

Imitation learning trains policies to map from input observations to the actions that an expert would choose. In this setting, distribution shift frequently exacerbates the effect of misattributing expert actions to nuisance correlates among the observed variables. We observe that a common instance of this causal confusion occurs in partially observed settings when expert actions are strongly correlated over time: the imitator learns to cheat by predicting the expert's previous action, rather than the next action. To combat this "copycat problem", we propose an adversarial approach to learn a feature representation that removes excess information about the previous expert action nuisance correlate, while retaining the information necessary to predict the next action. In our experiments, our approach improves performance significantly across a variety of partially observed imitation learning tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题