论文标题

默克尔播客语料库:一款来自安吉拉·默克尔(Angela Merkel)每周视频播客的16年的多模式数据集

Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts

论文作者

Saha, Debjoy, Nayak, Shravan, Baumann, Timo

论文摘要

我们介绍了Merkel Podcast语料库,这是一种来自德国的视听文本语料库,从前德国总理安吉拉·默克尔(Angela Merkel)的16年(几乎)每周的互联网播客收集。据我们所知,这是德语中的第一个单一扬声器语料库,其中包括相当大小和时间范围的音频,视觉和文本方式。我们描述了我们收集和编辑的方法,涉及下载视频,成绩单和其他元数据,强迫对齐,进行主动扬声器识别和面部检测,以最终策划由Angela Merkel所说的话语组成的单个扬声器数据集。所提出的管道是一般的,可用于策划其他类似性质的数据集,例如脱口秀内容。通过数据集的各种统计分析和应用在说话的面部生成和TT中,我们显示了数据集的实用性。我们认为,这是对研究界的宝贵贡献,特别是由于其在准备和自发演讲之间的边界上的现实和挑战性的材料。

We introduce the Merkel Podcast Corpus, an audio-visual-text corpus in German collected from 16 years of (almost) weekly Internet podcasts of former German chancellor Angela Merkel. To the best of our knowledge, this is the first single speaker corpus in the German language consisting of audio, visual and text modalities of comparable size and temporal extent. We describe the methods used with which we have collected and edited the data which involves downloading the videos, transcripts and other metadata, forced alignment, performing active speaker recognition and face detection to finally curate the single speaker dataset consisting of utterances spoken by Angela Merkel. The proposed pipeline is general and can be used to curate other datasets of similar nature, such as talk show contents. Through various statistical analyses and applications of the dataset in talking face generation and TTS, we show the utility of the dataset. We argue that it is a valuable contribution to the research community, in particular, due to its realistic and challenging material at the boundary between prepared and spontaneous speech.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源