论文标题

部分可观测时空混沌系统的无模型预测

Phonemic Representation and Transcription for Speech to Text Applications for Under-resourced Indigenous African Languages: The Case of Kiswahili

论文作者

Awino, Ebbie, Wanzare, Lilian, Muchemi, Lawrence, Wanjawa, Barack, Ombui, Edward, Indede, Florence, McOnyango, Owen, Okal, Benard

论文摘要

构建自动语音识别(ASR)系统是一项具有挑战性的任务,尤其是对于资源不足的语言而言,这些语言几乎需要从头开始构建语料库,并且缺乏足够的培训数据。它已经出现了,包括基斯瓦希里在内的几种非洲土著语言在技术上是资源不足的。 ASR系统至关重要,尤其是对于可以从母语中获得成绩单而受益的听力受损的人。但是,缺乏转录的语音数据集为开发这些土著语言开发ASR模型的努力复杂。本文探讨了基斯瓦希里语语料库的转录过程和发展,其中包括来自本地基斯瓦希里语者的读出文本和自发的语音数据。该研究还讨论了基斯瓦希里(Kiswahili)的元音和辅音,并为使用CMU SPHINX语音识别工具箱创建的ASR模型提供了更新的Swahili音素词典,这是一种开放源语音识别工具Kit。使用扩展的语音集对ASR模型进行了培训,该语音集的性能分别为18.87%和49.5%,比以前对资源不足的语言的类似研究的性能提高了。

Building automatic speech recognition (ASR) systems is a challenging task, especially for under-resourced languages that need to construct corpora nearly from scratch and lack sufficient training data. It has emerged that several African indigenous languages, including Kiswahili, are technologically under-resourced. ASR systems are crucial, particularly for the hearing-impaired persons who can benefit from having transcripts in their native languages. However, the absence of transcribed speech datasets has complicated efforts to develop ASR models for these indigenous languages. This paper explores the transcription process and the development of a Kiswahili speech corpus, which includes both read-out texts and spontaneous speech data from native Kiswahili speakers. The study also discusses the vowels and consonants in Kiswahili and provides an updated Kiswahili phoneme dictionary for the ASR model that was created using the CMU Sphinx speech recognition toolbox, an open-source speech recognition toolkit. The ASR model was trained using an extended phonetic set that yielded a WER and SER of 18.87% and 49.5%, respectively, an improved performance than previous similar research for under-resourced languages.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源