现实世界离线基于模型的强化学习的可区分物理模型

论文标题

现实世界离线基于模型的强化学习的可区分物理模型

Differentiable Physics Models for Real-world Offline Model-based Reinforcement Learning

论文作者

Lutter, Michael, Silberbauer, Johannes, Watson, Joe, Peters, Jan

论文摘要

基于模型的增强学习（MBRL）的局限性是对学习模型中错误的开发。黑框模型可以与高忠诚度相适应复杂的动态，但是它们的行为在数据分布之外不确定。基于物理的模型在推断出来的情况下，由于其知情结构的一般有效性，但由于存在未建模的现象，因此在现实世界中的合法性不足。在这项工作中，我们通过实验证明，对于基于离线模型的增强学习设置，如果已知机械结构，则基于物理的模型与高容量功能近似值相比可能是有益的。基于物理学的模型可以在物理操纵器上使用离线MBRL进行4分钟的采样数据来学习在物理操纵器上执行球（BIC）任务。我们发现，黑框模型始终产生不可行的BIC策略，因为所有预测的轨迹都与基于物理学的模型相比获得了更多的数据，但在物理上不可能的状态差异。此外，我们将物理参数识别的方法从建模自动多体型系统到具有端到端自动分化的非自然动力学的系统。视频：https：//sites.google.com/view/ball-in-cup-in-4-minutes/

A limitation of model-based reinforcement learning (MBRL) is the exploitation of errors in the learned models. Black-box models can fit complex dynamics with high fidelity, but their behavior is undefined outside of the data distribution.Physics-based models are better at extrapolating, due to the general validity of their informed structure, but underfit in the real world due to the presence of unmodeled phenomena. In this work, we demonstrate experimentally that for the offline model-based reinforcement learning setting, physics-based models can be beneficial compared to high-capacity function approximators if the mechanical structure is known. Physics-based models can learn to perform the ball in a cup (BiC) task on a physical manipulator using only 4 minutes of sampled data using offline MBRL. We find that black-box models consistently produce unviable policies for BiC as all predicted trajectories diverge to physically impossible state, despite having access to more data than the physics-based model. In addition, we generalize the approach of physics parameter identification from modeling holonomic multi-body systems to systems with nonholonomic dynamics using end-to-end automatic differentiation. Videos: https://sites.google.com/view/ball-in-a-cup-in-4-minutes/

下载PDF全文

下载文献需遵守相关版权规定

论文标题