具有不匹配的记录设备的声学场景分类的更宽或更深的神经网络体系结构

论文标题

具有不匹配的记录设备的声学场景分类的更宽或更深的神经网络体系结构

Wider or Deeper Neural Network Architecture for Acoustic Scene Classification with Mismatched Recording Devices

论文作者

Pham, Lam, Dinh, Khoa, Ngo, Dat, Tang, Hieu, Schindler, Alexander

论文摘要

在本文中，我们提出了一个可用于声学场景分类（ASC）的健壮且低复杂的系统，即确定录音场景的任务。我们首先构建了一个ASC基线系统，其中提出了一种基于新颖的基于蓄水的网络体系结构来处理不匹配的录制设备问题。为了进一步提高性能，但仍然满足低复杂性模型，我们采用了两种技术：在ASC基线系统上的多个频谱图和降低通道的集合。通过在基准DCASE 2020 Task 1A开发数据集上进行大量实验，我们实现了最佳模型，其精度为69.9％，低复杂性为240万，可训练的参数，这与最先进的ASC Systems具有竞争力，并且在Edge设备上实现了现实生活中的潜力。

In this paper, we present a robust and low complexity system for Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording. We first construct an ASC baseline system in which a novel inception-residual-based network architecture is proposed to deal with the mismatched recording device issue. To further improve the performance but still satisfy the low complexity model, we apply two techniques: ensemble of multiple spectrograms and channel reduction on the ASC baseline system. By conducting extensive experiments on the benchmark DCASE 2020 Task 1A Development dataset, we achieve the best model performing an accuracy of 69.9% and a low complexity of 2.4M trainable parameters, which is competitive to the state-of-the-art ASC systems and potential for real-life applications on edge devices.

下载PDF全文

下载文献需遵守相关版权规定

论文标题