使用自动语音识别

论文标题

使用自动语音识别

Zero-Shot Cross-lingual Aphasia Detection using Automatic Speech Recognition

论文作者

Chatzoudis, Gerasimos, Plitsis, Manos, Stamouli, Spyridoula, Dimou, Athanasia-Lida, Katsamanis, Athanasios, Katsouros, Vassilis

论文摘要

失语症是一种常见的言语和语言障碍，通常是由脑损伤或中风引起的，会影响全球数百万的人。检测和评估患者的失语症是一个困难，耗时的过程，并且已经进行了多次自动化尝试，这是使用接受失语性语音数据培训的机器学习模型最成功的。就像在许多医学应用中一样，失语性语音数据很少，问题以所谓的“低资源”语言加剧，对于此任务而言，大多数语言都不包括英语。我们试图通过使用语言反应语言特征来利用英语的可用数据，并以低资源语言（例如希腊语和法语）实现零摄像的失语症检测。当前的跨语性失语检测方法取决于手动提取的成绩单。我们使用预先训练的自动语音识别（ASR）模型提出了一条端到端管道，该模型共享跨语性语音表示，并针对我们所需的低资源语言进行了微调。为了进一步提高我们的ASR模型的性能，我们还将其与语言模型相结合。我们表明，我们的基于ASR的端到端管道提供了与以前的人类宣布的成绩单相当的结果。

Aphasia is a common speech and language disorder, typically caused by a brain injury or a stroke, that affects millions of people worldwide. Detecting and assessing Aphasia in patients is a difficult, time-consuming process, and numerous attempts to automate it have been made, the most successful using machine learning models trained on aphasic speech data. Like in many medical applications, aphasic speech data is scarce and the problem is exacerbated in so-called "low resource" languages, which are, for this task, most languages excluding English. We attempt to leverage available data in English and achieve zero-shot aphasia detection in low-resource languages such as Greek and French, by using language-agnostic linguistic features. Current cross-lingual aphasia detection approaches rely on manually extracted transcripts. We propose an end-to-end pipeline using pre-trained Automatic Speech Recognition (ASR) models that share cross-lingual speech representations and are fine-tuned for our desired low-resource languages. To further boost our ASR model's performance, we also combine it with a language model. We show that our ASR-based end-to-end pipeline offers comparable results to previous setups using human-annotated transcripts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题