通过张量融合层从自发语音中检测痴呆症检测的多模式方法

论文标题

通过张量融合层从自发语音中检测痴呆症检测的多模式方法

A Multimodal Approach for Dementia Detection from Spontaneous Speech with Tensor Fusion Layer

论文作者

Ilias, Loukas, Askounis, Dimitris, Psarras, John

论文摘要

阿尔茨海默氏病（AD）是一种进行性神经系统疾病，这意味着症状多年来逐渐发展。它也是痴呆症的主要原因，会影响记忆，思维技能和心理能力。如今，研究人员已经对自发演讲的AD检测产生了兴趣，因为它构成了时间效率的程序。但是，现有的最先进的作品提出了多模式方法，并未考虑模式间相互作用和模式内相互作用，并提出了早期和晚期的融合方法。为了解决这些局限性，我们提出了深层神经网络，可以以端到端的可训练方式对其进行训练，并捕获模式间和模式内相互作用。首先，将每个音频文件转换为由三个通道组成的图像，即log-mel频谱图，delta和delta-delta。接下来，每个成绩单都通过BERT模型，然后是一个封闭式的自我注意力层。类似地，每个图像都通过一个SWIN变压器，然后是独立的封闭式自发层。声学功能也从每个音频文件中提取。最后，从不同模态的表示向量被馈送到张量融合层以捕获模式间相互作用。在Adress挑战数据集上进行的广泛实验表明，我们引入的方法比现有的研究计划获得了有价值的优势，该计划达到准确性，F1得分分别高达86.25％和85.48％。

Alzheimer's disease (AD) is a progressive neurological disorder, meaning that the symptoms develop gradually throughout the years. It is also the main cause of dementia, which affects memory, thinking skills, and mental abilities. Nowadays, researchers have moved their interest towards AD detection from spontaneous speech, since it constitutes a time-effective procedure. However, existing state-of-the-art works proposing multimodal approaches do not take into consideration the inter- and intra-modal interactions and propose early and late fusion approaches. To tackle these limitations, we propose deep neural networks, which can be trained in an end-to-end trainable way and capture the inter- and intra-modal interactions. Firstly, each audio file is converted to an image consisting of three channels, i.e., log-Mel spectrogram, delta, and delta-delta. Next, each transcript is passed through a BERT model followed by a gated self-attention layer. Similarly, each image is passed through a Swin Transformer followed by an independent gated self-attention layer. Acoustic features are extracted also from each audio file. Finally, the representation vectors from the different modalities are fed to a tensor fusion layer for capturing the inter-modal interactions. Extensive experiments conducted on the ADReSS Challenge dataset indicate that our introduced approaches obtain valuable advantages over existing research initiatives reaching Accuracy and F1-score up to 86.25% and 85.48% respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题