学会模仿互联网视频中的对象互动

论文标题

学会模仿互联网视频中的对象互动

Learning to Imitate Object Interactions from Internet Videos

论文作者

Patel, Austin, Wang, Andrew, Radosavovic, Ilija, Malik, Jitendra

论文摘要

我们研究了从互联网视频中模仿对象交互的问题。这就需要在3D和随着时间的时间上在空间上理解4D中的手对象相互作用，这是由于相互的手动闭塞而挑战。在本文中，我们做出了两个主要贡献：（1）一种新颖的重建技术RHOV（从视频中重建手和对象），该技术使用2D图像提示和时间平滑度约束重建手和对象的4D轨迹；（2）通过增强学习模仿物理模拟器中对象相互作用的系统。我们将重建技术应用于100个具有挑战性的互联网视频。我们进一步表明，我们可以成功模仿物理模拟器中的一系列不同对象相互作用。我们以对象为中心的方法不仅限于类似人类的最终效果，并且可以学会使用不同的实施方案（例如带有平行的颌骨夹具的机器人手臂）模仿对象相互作用。

We study the problem of imitating object interactions from Internet videos. This requires understanding the hand-object interactions in 4D, spatially in 3D and over time, which is challenging due to mutual hand-object occlusions. In this paper we make two main contributions: (1) a novel reconstruction technique RHOV (Reconstructing Hands and Objects from Videos), which reconstructs 4D trajectories of both the hand and the object using 2D image cues and temporal smoothness constraints; (2) a system for imitating object interactions in a physics simulator with reinforcement learning. We apply our reconstruction technique to 100 challenging Internet videos. We further show that we can successfully imitate a range of different object interactions in a physics simulator. Our object-centric approach is not limited to human-like end-effectors and can learn to imitate object interactions using different embodiments, like a robotic arm with a parallel jaw gripper.

下载PDF全文

下载文献需遵守相关版权规定

论文标题