论文标题
扬声器条件链模型用于语音分离和提取
Speaker-Conditional Chain Model for Speech Separation and Extraction
论文作者
论文摘要
已经广泛探索了语音分离以解决鸡尾酒会问题。但是,这些研究仍然没有足够的概括能力来实现实际情况。在这项工作中,我们提出了一种名为“说话者 - 条件链”模型的共同策略,以处理复杂的语音记录。在提出的方法中,我们的模型首先根据序列到序列模型从观察值中渗透了可变数量的说话者的身份。然后,它将来自推断的扬声器的信息作为条件来提取他们的语音来源。通过从整个观察结果中预测的说话者信息,我们的模型有助于解决常规语音分离的问题,并为多轮长录音的扬声器提取问题。与先前的研究相比,来自标准全面拼写的语音分离基准的实验显示出可比的结果,而我们提出的模型可以更好地适应多轮记录。
Speech separation has been extensively explored to tackle the cocktail party problem. However, these studies are still far from having enough generalization capabilities for real scenarios. In this work, we raise a common strategy named Speaker-Conditional Chain Model to process complex speech recordings. In the proposed method, our model first infers the identities of variable numbers of speakers from the observation based on a sequence-to-sequence model. Then, it takes the information from the inferred speakers as conditions to extract their speech sources. With the predicted speaker information from whole observation, our model is helpful to solve the problem of conventional speech separation and speaker extraction for multi-round long recordings. The experiments from standard fully-overlapped speech separation benchmarks show comparable results with prior studies, while our proposed model gets better adaptability for multi-round long recordings.