论文标题

朝着基于知识驱动的抑郁症模型:利用语音元音的光谱变化

Toward Knowledge-Driven Speech-Based Models of Depression: Leveraging Spectrotemporal Variations in Speech Vowels

论文作者

Feng, Kexin, Chaspari, Theodora

论文摘要

与抑郁症相关的心理运动迟缓与元音产生的切实差异有关。本文研究了一种知识驱动的机器学习(ML)方法,该方法将元音级别语音的光谱信息整合在一起以识别抑郁症。低级语音描述符是通过接受元音分类训练的卷积神经网络(CNN)学习的。这些低水平描述符的时间演变是通过长期记忆(LSTM)模型在高级内部和跨语言上建模的,该模型采取了最终的抑郁症决定。进一步使用了局部可解释的模型解释(LIME)的修改版本,以确定低水平光谱元音元音变化对决策的影响,并观察到抑郁症可能性的高级时间变化。所提出的方法优于在不整合基于元音的信息的情况下对语音中的光谱信息进行建模的基准,以及使用常规韵律和光谱特征训练的ML模型。进行的解释性分析表明,与基于元音的信息相比,与非元音段相对应的频谱信息对应于非元音段。进一步检查有或没有抑郁症的参与者,进一步检查了捕获细分市场决策的高级信息的解释性。这项工作的发现可以为知识驱动的可解释的决策支持系统奠定基础,这些系统可以帮助临床医生更好地了解语音数据的细颗粒时间变化,最终增强了心理健康诊断和护理。

Psychomotor retardation associated with depression has been linked with tangible differences in vowel production. This paper investigates a knowledge-driven machine learning (ML) method that integrates spectrotemporal information of speech at the vowel-level to identify the depression. Low-level speech descriptors are learned by a convolutional neural network (CNN) that is trained for vowel classification. The temporal evolution of those low-level descriptors is modeled at the high-level within and across utterances via a long short-term memory (LSTM) model that takes the final depression decision. A modified version of the Local Interpretable Model-agnostic Explanations (LIME) is further used to identify the impact of the low-level spectrotemporal vowel variation on the decisions and observe the high-level temporal change of the depression likelihood. The proposed method outperforms baselines that model the spectrotemporal information in speech without integrating the vowel-based information, as well as ML models trained with conventional prosodic and spectrotemporal features. The conducted explainability analysis indicates that spectrotemporal information corresponding to non-vowel segments less important than the vowel-based information. Explainability of the high-level information capturing the segment-by-segment decisions is further inspected for participants with and without depression. The findings from this work can provide the foundation toward knowledge-driven interpretable decision-support systems that can assist clinicians to better understand fine-grain temporal changes in speech data, ultimately augmenting mental health diagnosis and care.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源