论文标题

AccentDB:非母语口音的数据库,以帮助神经语音识别

AccentDB: A Database of Non-Native English Accents to Assist Neural Speech Recognition

论文作者

Ahamad, Afroz, Anand, Ankit, Bhargava, Pranesh

论文摘要

现代自动语音识别(ASR)技术已经发展为识别母语者对语言的讲话。但是,对非母语说话者讲话的识别仍然是它的主要挑战。在这项工作中,我们首先阐明了在非本地口音中创建精心策划的语音样本数据库的关键要求,以培训和测试强大的ASR系统。然后,我们介绍了AccentDB,一个此类数据库包含我们收集的4种印度英语口音的样本,以及来自4个本地英语的样本,以及一个大都会印度 - 英语口音。我们还对收集的重音数据的可分离性进行了分析。此外,我们提出了几种重音分类模型,并根据人体标签的口音类别对其进行了彻底的评估。我们在各种可见数据和看不见的数据设置中测试分类器模型的概括。最后,我们使用具有特定于任务的体系结构的自动编码器模型介绍了非本地重音的重音中和的任务。因此,我们的工作旨在通过数据库,用于培训的数据库,特征增强的分类模型以及中和系统,以帮助ASR系统的每个阶段,以及用于非本地英语口音的声学转换的中和系统。

Modern Automatic Speech Recognition (ASR) technology has evolved to identify the speech spoken by native speakers of a language very well. However, identification of the speech spoken by non-native speakers continues to be a major challenge for it. In this work, we first spell out the key requirements for creating a well-curated database of speech samples in non-native accents for training and testing robust ASR systems. We then introduce AccentDB, one such database that contains samples of 4 Indian-English accents collected by us, and a compilation of samples from 4 native-English, and a metropolitan Indian-English accent. We also present an analysis on separability of the collected accent data. Further, we present several accent classification models and evaluate them thoroughly against human-labelled accent classes. We test the generalization of our classifier models in a variety of setups of seen and unseen data. Finally, we introduce the task of accent neutralization of non-native accents to native accents using autoencoder models with task-specific architectures. Thus, our work aims to aid ASR systems at every stage of development with a database for training, classification models for feature augmentation, and neutralization systems for acoustic transformations of non-native accents of English.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源