论文标题
半监督在神经分析 - 合成框架中的震颤脉冲位置的学习
Semi-supervised learning of glottal pulse positions in a neural analysis-synthesis framework
论文作者
论文摘要
本文调查了使用深层神经网络的最近新兴方法来估计闭合瞬间(GCI)。我们以先前使用合成语音的方法来创建完美注释的训练数据的方法为基础,并且已证明可以与使用电视仪(EGG)信号的其他训练方法进行比较。在这里,我们介绍了一种半监督的训练策略,该策略允许使用真实语音信号进行分析合成设置来精炼估计器,而GCI地面真相不存在。分析仪的评估是通过比较分析仪从GLOTTAL流动信号中提取的GCI与从CMU北极数据集中提取的GCI提取的GCI,在该数据集中还记录了卵信号,除了语音外,还记录了卵信号。我们观察到(1。)(1.)在我们先前的合成数据库构建中使用的脉冲形状多样性的人为增加是有益的,(2。)在分析 - 合成设置中训练GCI网络,允许对GCI分析仪的显着改进,(3。)在培训分析中的其他分析网络中,可以改善其他正则化策略在分析中进行分析,并在分析中进行分析。
This article investigates into recently emerging approaches that use deep neural networks for the estimation of glottal closure instants (GCI). We build upon our previous approach that used synthetic speech exclusively to create perfectly annotated training data and that had been shown to compare favourably with other training approaches using electroglottograph (EGG) signals. Here we introduce a semi-supervised training strategy that allows refining the estimator by means of an analysis-synthesis setup using real speech signals, for which GCI ground truth does not exist. Evaluation of the analyser is performed by means of comparing the GCI extracted from the glottal flow signal generated by the analyser with the GCI extracted from EGG on the CMU arctic dataset, where EGG signals were recorded in addition to speech. We observe that (1.) the artificial increase of the diversity of pulse shapes that has been used in our previous construction of the synthetic database is beneficial, (2.) training the GCI network in the analysis-synthesis setup allows achieving a very significant improvement of the GCI analyser, (3.) additional regularisation strategies allow improving the final analysis network when trained in the analysis-synthesis setup.