简单的合并前端，用于有效的音频分类

论文标题

简单的合并前端，用于有效的音频分类

Simple Pooling Front-ends For Efficient Audio Classification

论文作者

Liu, Xubo, Liu, Haohe, Kong, Qiuqiang, Mei, Xinhao, Plumbley, Mark D., Wang, Wenwu

论文摘要

最近，人们对建立有效的音频神经网络的兴趣越来越多。大多数现有方法旨在使用模型修剪等方法来减少音频神经网络的大小。在这项工作中，我们表明，而不是使用复杂的方法减少模型大小，而是消除输入音频特征（例如MEL-SPECTROGRAM）中的时间冗余可能是有效的音频分类的有效方法。为此，我们提出了一个简单的合并前端（SIMPF）家族，该家族使用简单的非参数池操作来减少MEL光谱图中的冗余信息。我们对四个音频分类任务进行广泛的实验，以评估SIMPF的性能。实验结果表明，SIMPF可以减少一半以上的浮点操作数量（FLOP），用于现成的音频神经网络，具有可忽略的降级，甚至可以改善音频分类性能。

Recently, there has been increasing interest in building efficient audio neural networks for on-device scenarios. Most existing approaches are designed to reduce the size of audio neural networks using methods such as model pruning. In this work, we show that instead of reducing model size using complex methods, eliminating the temporal redundancy in the input audio features (e.g., mel-spectrogram) could be an effective approach for efficient audio classification. To do so, we proposed a family of simple pooling front-ends (SimPFs) which use simple non-parametric pooling operations to reduce the redundant information within the mel-spectrogram. We perform extensive experiments on four audio classification tasks to evaluate the performance of SimPFs. Experimental results show that SimPFs can achieve a reduction in more than half of the number of floating point operations (FLOPs) for off-the-shelf audio neural networks, with negligible degradation or even some improvements in audio classification performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题