论文标题

悲剧性谈话者:莎士比亚的声音和光场数据集用于视听机器学习研究

Tragic Talkers: A Shakespearean Sound- and Light-Field Dataset for Audio-Visual Machine Learning Research

论文作者

Berghi, Davide, Volino, Marco, Jackson, Philip J. B.

论文摘要

3D Audio-Visual生产旨在为消费者提供沉浸式和互动的体验。然而,忠实地再现现实世界中的3D场景仍然是一项具有挑战性的任务。这部分是由于缺乏可用的数据集,因此可以朝这个方向进行视听研究。在大多数现有的多视图数据集中,随附的音频被忽略了。同样,空间音频研究的数据集主要提供单峰内容,并且当包括视觉数据时,质量远非满足标准生产需求。我们介绍了“悲剧性谈话者”,这是一个视听数据集,该数据集由“罗密欧与朱丽叶”戏剧的摘录组成,该戏剧是用麦克风阵列捕获的,以及用于光场视频的多个共同位置的相机。悲惨的谈话者为基于对象的媒体(OBM)生产提供理想的内容。它旨在涵盖各种常规的说话场景,例如独白,两人对话以及与大量运动和遮挡的相互作用,从总共22个不同的观点和两个16个元素的麦克风阵列中捕获了30个序列。此外,我们为每个相机视图提供语音活动标签,2D面边界框,2D姿势检测关键点,演员口的3D跟踪数据以及对话转录。我们认为社区将从该数据集中受益,因为它可以帮助多学科研究。讨论了数据集的可能用途。

3D audio-visual production aims to deliver immersive and interactive experiences to the consumer. Yet, faithfully reproducing real-world 3D scenes remains a challenging task. This is partly due to the lack of available datasets enabling audio-visual research in this direction. In most of the existing multi-view datasets, the accompanying audio is neglected. Similarly, datasets for spatial audio research primarily offer unimodal content, and when visual data is included, the quality is far from meeting the standard production needs. We present "Tragic Talkers", an audio-visual dataset consisting of excerpts from the "Romeo and Juliet" drama captured with microphone arrays and multiple co-located cameras for light-field video. Tragic Talkers provides ideal content for object-based media (OBM) production. It is designed to cover various conventional talking scenarios, such as monologues, two-people conversations, and interactions with considerable movement and occlusion, yielding 30 sequences captured from a total of 22 different points of view and two 16-element microphone arrays. Additionally, we provide voice activity labels, 2D face bounding boxes for each camera view, 2D pose detection keypoints, 3D tracking data of the mouth of the actors, and dialogue transcriptions. We believe the community will benefit from this dataset as it can assist multidisciplinary research. Possible uses of the dataset are discussed.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源