论文标题
学会模仿互联网视频中的对象互动
Learning to Imitate Object Interactions from Internet Videos
论文作者
论文摘要
我们研究了从互联网视频中模仿对象交互的问题。这就需要在3D和随着时间的时间上在空间上理解4D中的手对象相互作用,这是由于相互的手动闭塞而挑战。在本文中,我们做出了两个主要贡献:(1)一种新颖的重建技术RHOV(从视频中重建手和对象),该技术使用2D图像提示和时间平滑度约束重建手和对象的4D轨迹; (2)通过增强学习模仿物理模拟器中对象相互作用的系统。我们将重建技术应用于100个具有挑战性的互联网视频。我们进一步表明,我们可以成功模仿物理模拟器中的一系列不同对象相互作用。我们以对象为中心的方法不仅限于类似人类的最终效果,并且可以学会使用不同的实施方案(例如带有平行的颌骨夹具的机器人手臂)模仿对象相互作用。
We study the problem of imitating object interactions from Internet videos. This requires understanding the hand-object interactions in 4D, spatially in 3D and over time, which is challenging due to mutual hand-object occlusions. In this paper we make two main contributions: (1) a novel reconstruction technique RHOV (Reconstructing Hands and Objects from Videos), which reconstructs 4D trajectories of both the hand and the object using 2D image cues and temporal smoothness constraints; (2) a system for imitating object interactions in a physics simulator with reinforcement learning. We apply our reconstruction technique to 100 challenging Internet videos. We further show that we can successfully imitate a range of different object interactions in a physics simulator. Our object-centric approach is not limited to human-like end-effectors and can learn to imitate object interactions using different embodiments, like a robotic arm with a parallel jaw gripper.