论文标题
一种新的方法来识别和转换普通话
A New Approach to Accent Recognition and Conversion for Mandarin Chinese
论文作者
论文摘要
分别提出和探索了两种新的口音分类和转换方法。第一个主题是中国口音分类/认可。第二个主题是将编码器模型用于端到端的中国口音转换,其中第一个主题中的分类器用于训练重音转换器编码器模型。使用不同特征和模型进行重音识别的实验。这些功能包括MFCC和频谱图。分类器模型为TDNN和1D-CNN。在具有5种口音的MagicData数据集上,接受MFCC功能培训的TDNN分类器的测试准确性为54%,测试F1得分为0.54,而经过频谱图培训的1D-CNN分类器的测试准确性为62%,测试F1得分为0.62。还提供了端到端重音转换器模型的原型。转换器模型由编码器和解码器组成。编码器模型将重音输入转换为重音中性形式。解码器模型将重音形式转换为带有输入重音标签的指定重音的重音形式。转换器原型保留了音调,并预言了输出音频中的详细信息。编码器解码器结构证明了成为有效的重音转换器的潜力。还提出了未来改进的建议,以解决解码器输出中丢失细节的问题。
Two new approaches to accent classification and conversion are presented and explored, respectively. The first topic is Chinese accent classification/recognition. The second topic is the use of encoder-decoder models for end-to-end Chinese accent conversion, where the classifier in the first topic is used for the training of the accent converter encoder-decoder model. Experiments using different features and model are performed for accent recognition. These features include MFCCs and spectrograms. The classifier models were TDNN and 1D-CNN. On the MAGICDATA dataset with 5 classes of accents, the TDNN classifier trained on MFCC features achieved a test accuracy of 54% and a test F1 score of 0.54 while the 1D-CNN classifier trained on spectrograms achieve a test accuracy of 62% and a test F1 score of 0.62. A prototype of an end-to-end accent converter model is also presented. The converter model comprises of an encoder and a decoder. The encoder model converts an accented input into an accent-neutral form. The decoder model converts an accent-neutral form to an accented form with the specified accent assigned by the input accent label. The converter prototype preserves the tone and foregoes the details in the output audio. An encoder-decoder structure demonstrates the potential of being an effective accent converter. A proposal for future improvements is also presented to address the issue of lost details in the decoder output.