使用面部肌肉运动的可解释的归因向量的口吃语音差异预测

论文标题

使用面部肌肉运动的可解释的归因向量的口吃语音差异预测

Stuttering Speech Disfluency Prediction using Explainable Attribution Vectors of Facial Muscle Movements

论文作者

Das, Arun, Mock, Jeffrey, Chacon, Henry, Irani, Farzan, Golob, Edward, Najafirad, Peyman

论文摘要

语音障碍，例如结结巴巴通过非自愿重复，延长和声音和音节的封锁来破坏语音的正常流畅性。除了对语音流利的破坏外，大多数口吃（AWS）的成年人还经历了许多可观察到的次要行为，此前，期间和过后，通常涉及面部肌肉。最近的研究探索了使用基于人工智能（AI）的算法从呼吸率，音频等中自动检测口吃的检测。但是，大多数方法都需要受控的环境和/或侵入性可穿戴传感器，并且无法解释为什么做出决定（流利的与口吃）。我们假设可以非侵入性地捕获的AWS中的言论面部活动包含足够的信息，以将即将到来的话语准确地将其分类为流利或口吃的。为此，本文提出了一种可解释的AI（XAI）辅助卷积神经网络（CNN）分类器，以通过学习AWS的时间面部肌肉运动模式来预测不久的将来，并解释了涉及的重要面部肌肉和动作。统计分析表明，脸颊肌肉（p <0.005）和唇部肌肉（p <0.005）的患病率明显高，以预测口吃，并显示出有利于唤醒和预期的行为。对这些上部和下面部肌肉的时间研究可能有助于早期发现口吃，促进口吃自动评估，并通过实时提供自动非侵入性反馈来实现行为疗法。

Speech disorders such as stuttering disrupt the normal fluency of speech by involuntary repetitions, prolongations and blocking of sounds and syllables. In addition to these disruptions to speech fluency, most adults who stutter (AWS) also experience numerous observable secondary behaviors before, during, and after a stuttering moment, often involving the facial muscles. Recent studies have explored automatic detection of stuttering using Artificial Intelligence (AI) based algorithm from respiratory rate, audio, etc. during speech utterance. However, most methods require controlled environments and/or invasive wearable sensors, and are unable explain why a decision (fluent vs stuttered) was made. We hypothesize that pre-speech facial activity in AWS, which can be captured non-invasively, contains enough information to accurately classify the upcoming utterance as either fluent or stuttered. Towards this end, this paper proposes a novel explainable AI (XAI) assisted convolutional neural network (CNN) classifier to predict near future stuttering by learning temporal facial muscle movement patterns of AWS and explains the important facial muscles and actions involved. Statistical analyses reveal significantly high prevalence of cheek muscles (p<0.005) and lip muscles (p<0.005) to predict stuttering and shows a behavior conducive of arousal and anticipation to speak. The temporal study of these upper and lower facial muscles may facilitate early detection of stuttering, promote automated assessment of stuttering and have application in behavioral therapies by providing automatic non-invasive feedback in realtime.

下载PDF全文

下载文献需遵守相关版权规定

论文标题