非裔美国人英语的自动方言密度估计

论文标题

非裔美国人英语的自动方言密度估计

Automatic Dialect Density Estimation for African American English

论文作者

Johnson, Alexander, Everson, Kevin, Ravi, Vijay, Gladney, Anissa, Ostendorf, Mari, Alwan, Abeer

论文摘要

在本文中，我们探讨了非裔美国人英语（AAE）方言的方言密度的自动预测，其中方言密度被定义为包含非标准方言特征的话语中单词的百分比。除了从音频文件的ASR转录本和韵律信息中提取的信息外，我们还研究了几种声学和语言建模功能，包括常用的X矢量表示和比较功能集。为了解决有限标记数据的问题，我们使用弱监督的模型将韵律和X-vector功能投影到相关的与任务相关的表示形式中。然后，使用XGBoost模型来预测这些特征中说话者的方言密度，并显示在推理过程中最重要的。我们可以单独和组合这些功能的这些功能评估这些功能的实用性。这项不依赖手工标记的成绩单的工作是在Coraal数据库的音频段上执行的。我们在此数据库中显示了AAE语音的预测和地面真相方言密度度量之间的显着相关性，并将这项工作作为解释和减轻语音技术偏见的工具。

In this paper, we explore automatic prediction of dialect density of the African American English (AAE) dialect, where dialect density is defined as the percentage of words in an utterance that contain characteristics of the non-standard dialect. We investigate several acoustic and language modeling features, including the commonly used X-vector representation and ComParE feature set, in addition to information extracted from ASR transcripts of the audio files and prosodic information. To address issues of limited labeled data, we use a weakly supervised model to project prosodic and X-vector features into low-dimensional task-relevant representations. An XGBoost model is then used to predict the speaker's dialect density from these features and show which are most significant during inference. We evaluate the utility of these features both alone and in combination for the given task. This work, which does not rely on hand-labeled transcripts, is performed on audio segments from the CORAAL database. We show a significant correlation between our predicted and ground truth dialect density measures for AAE speech in this database and propose this work as a tool for explaining and mitigating bias in speech technology.

下载PDF全文

下载文献需遵守相关版权规定

论文标题