使用预训练的语言模型评估ESL语音的短语中断

论文标题

使用预训练的语言模型评估ESL语音的短语中断

Assessing Phrase Break of ESL speech with Pre-trained Language Models

论文作者

Wang, Zhiyi, Mao, Shaoguang, Wu, Wenshan, Xia, Yan

论文摘要

这项工作介绍了一种评估ESL学习者语言中的短语中断的方法，该语言模型（PLM）。该提案与传统方法不同，将语音转换为令牌序列，然后利用PLM的力量。有两个子任务：对语音剪辑的短语中断总体评估；对每个可能的短语中断位置的细粒度评估。首先将语音输入与文本相一致，然后预处理到令牌序列，包括单词和相关的短语中断信息。然后将令牌序列送入预训练和微调管道中。在预训练中，替换的断裂令牌检测模块通过令牌数据训练，每个令牌都有一定百分比的机会被随机替换。在微调中，分别通过文本分类和序列标记管道优化了总体和细粒度评分。随着PLM的引入，对标记培训数据的依赖大大降低了，并且性能得到了改善。

This work introduces an approach to assessing phrase break in ESL learners' speech with pre-trained language models (PLMs). Different with traditional methods, this proposal converts speech to token sequences, and then leverages the power of PLMs. There are two sub-tasks: overall assessment of phrase break for a speech clip; fine-grained assessment of every possible phrase break position. Speech input is first force-aligned with texts, then pre-processed to a token sequence, including words and associated phrase break information. The token sequence is then fed into the pre-training and fine-tuning pipeline. In pre-training, a replaced break token detection module is trained with token data where each token has a certain percentage chance to be randomly replaced. In fine-tuning, overall and fine-grained scoring are optimized with text classification and sequence labeling pipeline, respectively. With the introduction of PLMs, the dependence on labeled training data has been greatly reduced, and performance has improved.

下载PDF全文

下载文献需遵守相关版权规定

论文标题