论文标题

熊猫:吉吉像素级以人为中心的视频数据集

PANDA: A Gigapixel-level Human-centric Video Dataset

论文作者

Wang, Xueyang, Zhang, Xiya, Zhu, Yinheng, Guo, Yuchen, Yuan, Xiaoyun, Xiang, Liuyu, Wang, Zerun, Ding, Guiguang, Brady, David J, Dai, Qionghai, Fang, Lu

论文摘要

我们提出了熊猫,这是第一个用于大规模,长期和多对象的视觉分析的吉普像素级级视频数据集。熊猫中的视频是由吉吉像素摄像头捕获的,并覆盖了带有广阔视野(〜1平方公里区域)和高分辨率细节(〜Gigapixel-level/Frame)的现实世界场景。场景可能包含超过100倍尺度变化的4K头计数。 Panda提供了丰富的层次基础真相注释,包括15,974.6K边界盒,111.8K细粒属性标签,12.7K轨迹,2.2k组和2.9k相互作用。我们基准了人类的检测和跟踪任务。由于行人姿势,尺度,遮挡和轨迹的巨大差异,现有方法受到准确性和效率的挑战。鉴于熊猫具有广泛的FOV和高分辨率的独特性,引入了互动感知组检测的新任务。我们设计了一个“全部本地缩放”框架,在该框架中同时编码了全局轨迹和局部交互,从而产生了令人鼓舞的结果。我们认为,熊猫将通过了解大型现实世界中的人类行为和互动来为人工智能和实践学社区做出贡献。熊猫网站:http://www.panda-dataset.com。

We present PANDA, the first gigaPixel-level humAN-centric viDeo dAtaset, for large-scale, long-term, and multi-object visual analysis. The videos in PANDA were captured by a gigapixel camera and cover real-world scenes with both wide field-of-view (~1 square kilometer area) and high-resolution details (~gigapixel-level/frame). The scenes may contain 4k head counts with over 100x scale variation. PANDA provides enriched and hierarchical ground-truth annotations, including 15,974.6k bounding boxes, 111.8k fine-grained attribute labels, 12.7k trajectories, 2.2k groups and 2.9k interactions. We benchmark the human detection and tracking tasks. Due to the vast variance of pedestrian pose, scale, occlusion and trajectory, existing approaches are challenged by both accuracy and efficiency. Given the uniqueness of PANDA with both wide FoV and high resolution, a new task of interaction-aware group detection is introduced. We design a 'global-to-local zoom-in' framework, where global trajectories and local interactions are simultaneously encoded, yielding promising results. We believe PANDA will contribute to the community of artificial intelligence and praxeology by understanding human behaviors and interactions in large-scale real-world scenes. PANDA Website: http://www.panda-dataset.com.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源