3D-OES：视图不变的对象因素化环境模拟器

论文标题

3D-OES：视图不变的对象因素化环境模拟器

3D-OES: Viewpoint-Invariant Object-Factorized Environment Simulators

论文作者

Tung, Hsiao-Yu Fish, Xian, Zhou, Prabhudesai, Mihir, Lal, Shamit, Fragkiadaki, Katerina

论文摘要

我们提出了一个由动作条件的动力学模型，该模型可以预测由RGB-D视频推断出的视点为Invariant 3D神经场景表示空间中的对象和代理相互作用引起的场景变化。在这个3D特征空间中，对象不会彼此干扰，并且它们的外观随着时间的推移和跨观点的持续存在。这使我们的模型可以通过基于累积对象运动预测“移动” 3D对象特征来预测未来的未来场景。对象运动预测由图形神经网络计算，该图神经网络通过从3D神经场景表示的对象特征进行操作。从任何所需的角度来看，我们的模型的模拟可以通过神经渲染器将其解码为2D图像视图，这有助于我们潜在的3D模拟空间的解释性。我们展示了我们的模型，可以很好地概括其在不同数字和相互作用对象以及跨摄像机视点的外观上的预测，超过了现有的2D和3D动力学模型。我们通过将仅在模拟中训练的模型应用于基于模型的控制，以将对象推到真正的机器人设置下的混乱中的所需位置，从而进一步证明了学习动力学的SIM到现实传输。

We propose an action-conditioned dynamics model that predicts scene changes caused by object and agent interactions in a viewpoint-invariant 3D neural scene representation space, inferred from RGB-D videos. In this 3D feature space, objects do not interfere with one another and their appearance persists over time and across viewpoints. This permits our model to predict future scenes long in the future by simply "moving" 3D object features based on cumulative object motion predictions. Object motion predictions are computed by a graph neural network that operates over the object features extracted from the 3D neural scene representation. Our model's simulations can be decoded by a neural renderer into2D image views from any desired viewpoint, which aids the interpretability of our latent 3D simulation space. We show our model generalizes well its predictions across varying number and appearances of interacting objects as well as across camera viewpoints, outperforming existing 2D and 3D dynamics models. We further demonstrate sim-to-real transfer of the learnt dynamics by applying our model trained solely in simulation to model-based control for pushing objects to desired locations under clutter on a real robotic setup

下载PDF全文

下载文献需遵守相关版权规定

论文标题