使用WAV2VEC 2.0的多任务检测扬声器变化，语音重叠和语音活动重叠

论文标题

使用WAV2VEC 2.0的多任务检测扬声器变化，语音重叠和语音活动重叠

Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0

论文作者

Kunešová, Marie, Zajíc, Zbyněk

论文摘要

最近，在广泛的机器学习问题上取得了巨大的成功。在语音处理领域，最成功的最成功的自我监督模型之一是WAV2VEC 2.0。在本文中，我们探讨了该模型对三个基本语音分类任务的有效性：说话者变更检测，重叠的语音检测和语音活动检测。首先，我们只专注于一项任务 - 说话者变更检测 - 我们的建议系统超过了以前报告的四个不同语料库的结果，即使在人工设计的数据集中接受了室外数据的培训，也可以实现可比性的性能。然后，我们扩展了在AMI语料库中具有最先进的性能的单个多任务系统中处理所有三个任务的方法。本文在本文中的实现可在https://github.com/mkunes/w2v2_audioframeclassification上公开获得。

Self-supervised learning approaches have lately achieved great success on a broad spectrum of machine learning problems. In the field of speech processing, one of the most successful recent self-supervised models is wav2vec 2.0. In this paper, we explore the effectiveness of this model on three basic speech classification tasks: speaker change detection, overlapped speech detection, and voice activity detection. First, we concentrate on only one task -- speaker change detection -- where our proposed system surpasses the previously reported results on four different corpora, and achieves comparable performance even when trained on out-of-domain data from an artificially designed dataset. Then we expand our approach to tackle all three tasks in a single multitask system with state-of-the-art performance on the AMI corpus. The implementation of the algorithms in this paper is publicly available at https://github.com/mkunes/w2v2_audioFrameClassification.

下载PDF全文

下载文献需遵守相关版权规定

论文标题