深关节稀疏的非负矩阵分解框架，用于识别语音过程中舌头运动的常见和特定主题的功能单位

论文标题

深关节稀疏的非负矩阵分解框架，用于识别语音过程中舌头运动的常见和特定主题的功能单位

A Deep Joint Sparse Non-negative Matrix Factorization Framework for Identifying the Common and Subject-specific Functional Units of Tongue Motion During Speech

论文作者

Woo, Jonghye, Xing, Fangxu, Prince, Jerry L., Stone, Maureen, Gomez, Arnold, Reese, Timothy G., Wedeen, Van J., Fakhri, Georges El

论文摘要

可理解的语音是通过创建以系统和协调方式生成的不同内部局部肌肉组（即功能单元）来产生的。在表征和分析功能单元时面临两个主要挑战。其次，由于其实质性变异性，将确定的功能单位保持在可比的受试者之间是一项挑战。在这项工作中，为了应对这些挑战，我们开发了一个新的深度学习框架，以在语音期间识别舌头运动的常见和特定于主题的功能单位。更具体地说，我们通过展开迭代的迭代收缩率阈值算法来学习可解释的构建块和相关的加权图。然后，我们将光谱聚类应用于常见和特定主体的加权图，从中我们共同确定常见和特定于主体的功能单位。使用模拟数据集进行的实验表明，在比较方法上，在PAR或更好的聚类性能上实现的建议方法。使用体内舌头运动数据进行的实验表明，所提出的方法可以确定具有可解释性和尺寸可变性降低的常见和特定于主体的功能单位。

Intelligible speech is produced by creating varying internal local muscle groupings -- i.e., functional units -- that are generated in a systematic and coordinated manner. There are two major challenges in characterizing and analyzing functional units.~First, due to the complex and convoluted nature of tongue structure and function, it is of great importance to develop a method that can accurately decode complex muscle coordination patterns during speech. Second, it is challenging to keep identified functional units across subjects comparable due to their substantial variability. In this work, to address these challenges, we develop a new deep learning framework to identify common and subject-specific functional units of tongue motion during speech.~Our framework hinges on joint deep graph-regularized sparse non-negative matrix factorization (NMF) using motion quantities derived from displacements by tagged Magnetic Resonance Imaging. More specifically, we transform NMF with sparse and graph regularizations into modular architectures akin to deep neural networks by means of unfolding the Iterative Shrinkage-Thresholding Algorithm to learn interpretable building blocks and associated weighting map. We then apply spectral clustering to common and subject-specific weighting maps from which we jointly determine the common and subject-specific functional units. Experiments carried out with simulated datasets show that the proposed method achieved on par or better clustering performance over the comparison methods. Experiments carried out with in vivo tongue motion data show that the proposed method can determine the common and subject-specific functional units with increased interpretability and decreased size variability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题