论文标题
看谁不说话
Look who's not talking
论文作者
论文摘要
这项工作的目的是“在野外”中对语音录音的说话者诊断。确定语音细分的能力是腹泻系统的关键部分,占很大比例的错误。在本文中,我们为基于说话者嵌入的语音活动检测提供了一种简单但有效的解决方案。特别是,我们发现说话者嵌入的规范是言语活动的极为有效的指标。该方法不需要独立的模型进行语音活动检测,因此可以使用统一表示来进行说话者的诊断,以供说话者建模和语音活动检测。我们在内部和公共数据集上执行许多实验,其中我们的方法的表现优于流行的基线。
The objective of this work is speaker diarisation of speech recordings 'in the wild'. The ability to determine speech segments is a crucial part of diarisation systems, accounting for a large proportion of errors. In this paper, we present a simple but effective solution for speech activity detection based on the speaker embeddings. In particular, we discover that the norm of the speaker embedding is an extremely effective indicator of speech activity. The method does not require an independent model for speech activity detection, therefore allows speaker diarisation to be performed using a unified representation for both speaker modelling and speech activity detection. We perform a number of experiments on in-house and public datasets, in which our method outperforms popular baselines.