论文标题
基于上下文的印度语言中ASR系统的唱片外词恢复
Context-based out-of-vocabulary word recovery for ASR systems in Indian languages
论文作者
论文摘要
对于自动语音识别(ASR)系统而言,检测和恢复量不足(OOV)单词总是具有挑战性的。许多现有方法着重于通过修改声学和语言模型并巧妙地集成上下文单词来建模OOV单词。要培训这样的复杂模型,我们需要大量数据,其中包括上下文单词,额外的训练时间和增加模型大小。但是,在获取ASR转录以恢复基于上下文的OOV单词之后,后处理方法并未得到太多探索。在这项工作中,我们提出了一种后处理技术,以提高基于上下文的OOV恢复的性能。我们创建了一个具有声音增强的语言模型,并在电话级上用OOV单词列表进行了子图。我们提出了两种方法来确定合适的成本函数,以根据上下文检索OOV单词。成本函数是根据语音和声学知识来定义的,用于匹配和恢复解码中的正确上下文单词。在文字级别和句子级别上都评估了提议的成本函数的有效性。评估结果表明,这种方法可以平均在多个类别中恢复50%基于上下文的OOV单词。
Detecting and recovering out-of-vocabulary (OOV) words is always challenging for Automatic Speech Recognition (ASR) systems. Many existing methods focus on modeling OOV words by modifying acoustic and language models and integrating context words cleverly into models. To train such complex models, we need a large amount of data with context words, additional training time, and increased model size. However, after getting the ASR transcription to recover context-based OOV words, the post-processing method has not been explored much. In this work, we propose a post-processing technique to improve the performance of context-based OOV recovery. We created an acoustically boosted language model with a sub-graph made at phone level with an OOV words list. We proposed two methods to determine a suitable cost function to retrieve the OOV words based on the context. The cost function is defined based on phonetic and acoustic knowledge for matching and recovering the correct context words in the decode. The effectiveness of the proposed cost function is evaluated at both word-level and sentence-level. The evaluation results show that this approach can recover an average of 50% context-based OOV words across multiple categories.