正确完成的录音：第二比较深度学习方法的环境声音分类

论文标题

正确完成的录音：第二比较深度学习方法的环境声音分类

AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification

论文作者

Li, Juncheng B, Qu, Shuhui, Huang, Po-Yao, Metze, Florian

论文摘要

在视觉和语言任务方面取得了巨大的成功之后，纯粹的基于注意力的神经体系结构（例如DEIT）正出现到音频标记（AT）排行榜的顶部，该排行榜似乎昏迷了传统的卷积神经网络（CNN），进料前进的网络或反复的网络。但是，仔细观察，已发表的研究有很大的差异，例如，以预审计的重量初始化的模型的表现与没有预算时的模型截然不同，模型的训练时间从小时到几周不等，并且通常隐藏在看似琐碎的细节中。由于我们的第一个比较已久了，所以这迫切需要进行全面的研究。在这项工作中，我们对Audioset进行了广泛的实验，这是最大的标记声音事件数据集，我们还根据数据质量和效率进行了分析。我们比较了AT任务上的一些最先进的基线，并研究了2个主要类别的神经体系结构的性能和效率：CNN变体和基于注意力的变体。我们还仔细检查了他们的优化程序。我们开放的实验结果为从业者和研究人员提供了绩效，效率，优化过程之间的权衡方面的见解。实施：https：//github.com/lijuncheng16/audiotaggingdoneright

After its sweeping success in vision and language tasks, pure attention-based neural architectures (e.g. DeiT) are emerging to the top of audio tagging (AT) leaderboards, which seemingly obsoletes traditional convolutional neural networks (CNNs), feed-forward networks or recurrent networks. However, taking a closer look, there is great variability in published research, for instance, performances of models initialized with pretrained weights differ drastically from without pretraining, training time for a model varies from hours to weeks, and often, essences are hidden in seemingly trivial details. This urgently calls for a comprehensive study since our 1st comparison is half-decade old. In this work, we perform extensive experiments on AudioSet which is the largest weakly-labeled sound event dataset available, we also did an analysis based on the data quality and efficiency. We compare a few state-of-the-art baselines on the AT task, and study the performance and efficiency of 2 major categories of neural architectures: CNN variants and attention-based variants. We also closely examine their optimization procedures. Our opensourced experimental results provide insights to trade-off between performance, efficiency, optimization process, for both practitioners and researchers. Implementation: https://github.com/lijuncheng16/AudioTaggingDoneRight

下载PDF全文

下载文献需遵守相关版权规定

论文标题