论文标题
通过神经隐藏的马尔可夫模型学习离散语音表示的依赖性
Learning Dependencies of Discrete Speech Representations with Neural Hidden Markov Models
论文作者
论文摘要
尽管离散的潜在变量模型在自我监督的学习中取得了巨大成功,但大多数模型都认为帧是独立的。由于语音感知中音素的分段性质,在框架级别的潜在变量之间建模依赖性可以潜在地改善与语音相关的任务上学习的表示。在这项工作中,我们假设潜在变量之间的马尔可夫依赖性,并建议通过神经隐藏的马尔可夫模型学习语音表示。我们的一般框架使我们能够与假定独立性的自我监督模型进行比较,同时保持参数数量固定。附加的依赖项提高了语音信息,语音分割和手机群集的访问性,从而展示了假定依赖性的好处。
While discrete latent variable models have had great success in self-supervised learning, most models assume that frames are independent. Due to the segmental nature of phonemes in speech perception, modeling dependencies among latent variables at the frame level can potentially improve the learned representations on phonetic-related tasks. In this work, we assume Markovian dependencies among latent variables, and propose to learn speech representations with neural hidden Markov models. Our general framework allows us to compare to self-supervised models that assume independence, while keeping the number of parameters fixed. The added dependencies improve the accessibility of phonetic information, phonetic segmentation, and the cluster purity of phones, showcasing the benefit of the assumed dependencies.