Egoenv：以人为中心视频的以人为中心的环境表示

论文标题

Egoenv：以人为中心视频的以人为中心的环境表示

EgoEnv: Human-centric environment representations from egocentric video

论文作者

Nagarajan, Tushar, Ramakrishnan, Santhosh Kumar, Desai, Ruta, Hillis, James, Grauman, Kristen

论文摘要

第一人称视频在其持续环境的背景下突出了摄影师的活动。但是，当前的视频理解方法是从短视频剪辑中的视觉特征的原因，这些视频片段与基础物理空间分离并仅捕获立即可见的内容。为了促进以人为中心的环境的理解，我们提出了一种通过学习摄影师（可能看不见的）当地环境来联系以自我为中心的视频和环境的方法。我们使用来自模拟的3D环境中的代理商的视频进行训练，在该环境中，环境完全可以观察到，并对来自看不见环境的人类捕获现实世界的视频进行测试。在两个以人为中心的视频任务上，我们表明配备了我们的环境感知功能的模型始终超过其传统剪辑功能的模型。此外，尽管仅接受了模拟视频的培训，但我们的方法还是成功处理了来自HouteTours和Ego4D的真实视频，并在EGO4D NLQ挑战赛上取得了最新的结果。项目页面：https：//vision.cs.utexas.edu/projects/ego-env/

First-person video highlights a camera-wearer's activities in the context of their persistent environment. However, current video understanding approaches reason over visual features from short video clips that are detached from the underlying physical space and capture only what is immediately visible. To facilitate human-centric environment understanding, we present an approach that links egocentric video and the environment by learning representations that are predictive of the camera-wearer's (potentially unseen) local surroundings. We train such models using videos from agents in simulated 3D environments where the environment is fully observable, and test them on human-captured real-world videos from unseen environments. On two human-centric video tasks, we show that models equipped with our environment-aware features consistently outperform their counterparts with traditional clip features. Moreover, despite being trained exclusively on simulated videos, our approach successfully handles real-world videos from HouseTours and Ego4D, and achieves state-of-the-art results on the Ego4D NLQ challenge. Project page: https://vision.cs.utexas.edu/projects/ego-env/

下载PDF全文

下载文献需遵守相关版权规定

论文标题