论文标题

3M:用于英语发音评估的有效的多视图,多层次和多光值建模方法

3M: An Effective Multi-view, Multi-granularity, and Multi-aspect Modeling Approach to English Pronunciation Assessment

论文作者

Chao, Fu-An, Lo, Tien-Hong, Wu, Tzu-I, Sung, Yao-Ting, Chen, Berlin

论文摘要

作为计算机辅助发音训练(Capt)必不可少的成分,自动发音评估(APA)通过提供多种态度和及时的反馈来帮助自我指导的语言学习者在帮助自我指导的语言学习者中起着重要作用。但是,至少有两个潜在的障碍可能会阻碍其在实际使用中的性能。一方面,大多数研究专注于利用节段(语音)级别的特征,例如发音良好(GOP);但是,这可能会在执行上段(韵律)级发音评估时会导致特征粒度的差异。另一方面,自动发音评估仍然缺乏非本地人说话者的大规模标记的语音数据,这不可避免地限制了发音评估的性能。在本文中,我们通过整合多个韵律和语音特征来解决这些问题,以提供多视图,多范围和多范围(3M)发音建模。具体而言,我们以韵律和自我监督的学习(SSL)特征增强共和党,同时开发元音/辅音位置嵌入,以进行更具语音感知的自动发音评估。与以前的工作相比,我们的方法可以对几种评估粒度进行重大改进,尤其是在评估口语流利性和语音疾病时,我们的方法可以对几种评估粒度进行重大改进。

As an indispensable ingredient of computer-assisted pronunciation training (CAPT), automatic pronunciation assessment (APA) plays a pivotal role in aiding self-directed language learners by providing multi-aspect and timely feedback. However, there are at least two potential obstacles that might hinder its performance for practical use. On one hand, most of the studies focus exclusively on leveraging segmental (phonetic)-level features such as goodness of pronunciation (GOP); this, however, may cause a discrepancy of feature granularity when performing suprasegmental (prosodic)-level pronunciation assessment. On the other hand, automatic pronunciation assessments still suffer from the lack of large-scale labeled speech data of non-native speakers, which inevitably limits the performance of pronunciation assessment. In this paper, we tackle these problems by integrating multiple prosodic and phonological features to provide a multi-view, multi-granularity, and multi-aspect (3M) pronunciation modeling. Specifically, we augment GOP with prosodic and self-supervised learning (SSL) features, and meanwhile develop a vowel/consonant positional embedding for a more phonology-aware automatic pronunciation assessment. A series of experiments conducted on the publicly-available speechocean762 dataset show that our approach can obtain significant improvements on several assessment granularities in comparison with previous work, especially on the assessment of speaking fluency and speech prosody.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源