副语言学任务的基于端到端合奏的功能选择

论文标题

副语言学任务的基于端到端合奏的功能选择

End-to-end Ensemble-based Feature Selection for Paralinguistics Tasks

论文作者

Grósz, Tamás, Singh, Mittul, Kadiri, Sudarsana Reddy, Kathania, Hemant, Kurimo, Mikko

论文摘要

近年来的事件强调了远程医疗解决方案的重要性，这可能有可能允许远程治疗和诊断。相关的是，计算副语言学是语音处理的独特子字段，旨在提取有关说话者的信息，并构成远程医疗应用的重要组成部分。在这项工作中，我们关注两个副语言问题：掩盖检测和呼吸状态预测。为这些任务开发的解决方案可能是无价的，并且有可能帮助监测和限制像Covid-19这样的病毒的传播。针对这些任务提出的当前最新方法是基于深层神经网络（如Resnets）与功能工程结合使用的合奏。尽管这些合奏可以达到高精度，但它们也具有较大的占地面积，并且需要大量的计算能力，以降低资源有限的设备的可移植性。这些缺点还意味着，由于其尺寸和速度，先前提出的解决方案不可行。另一方面，采用较轻的功能工程系统可能会很费力，并增加了进一步的复杂性，因此很难快速创建可部署的系统。这项工作提出了一种基于合奏的自动特征选择方法，以实现快速和存储效率系统的开发。特别是，我们提出了一种基于输出梯度的方法，可以在训练较小的合奏之前使用大型，表现良好的合奏发现基本特征。在我们的实验中，我们使用基于基于输出梯度的特征的神经网络合奏观察到了推理时间的大量（25-32％）。我们的方法提供了一种简单的方法来提高系统速度并启用实时使用量，同时使用所有光谱特征通过更大的脚印合奏保持竞争性结果。

The events of recent years have highlighted the importance of telemedicine solutions which could potentially allow remote treatment and diagnosis. Relatedly, Computational Paralinguistics, a unique subfield of Speech Processing, aims to extract information about the speaker and form an important part of telemedicine applications. In this work, we focus on two paralinguistic problems: mask detection and breathing state prediction. Solutions developed for these tasks could be invaluable and have the potential to help monitor and limit the spread of a virus like COVID-19. The current state-of-the-art methods proposed for these tasks are ensembles based on deep neural networks like ResNets in conjunction with feature engineering. Although these ensembles can achieve high accuracy, they also have a large footprint and require substantial computational power reducing portability to devices with limited resources. These drawbacks also mean that the previously proposed solutions are infeasible to be used in a telemedicine system due to their size and speed. On the other hand, employing lighter feature-engineered systems can be laborious and add further complexity making them difficult to create a deployable system quickly. This work proposes an ensemble-based automatic feature selection method to enable the development of fast and memory-efficient systems. In particular, we propose an output-gradient-based method to discover essential features using large, well-performing ensembles before training a smaller one. In our experiments, we observed considerable (25-32%) reductions in inference times using neural network ensembles based on output-gradient-based features. Our method offers a simple way to increase the speed of the system and enable real-time usage while maintaining competitive results with larger-footprint ensemble using all spectral features.

下载PDF全文

下载文献需遵守相关版权规定

论文标题