从分类学学习：多标签的少量拍摄分类用于日常声音识别

论文标题

从分类学学习：多标签的少量拍摄分类用于日常声音识别

Learning from Taxonomy: Multi-label Few-Shot Classification for Everyday Sound Recognition

论文作者

Liang, Jinhua, Phan, Huy, Benetos, Emmanouil

论文摘要

每天的声音识别旨在推断音频流中的声音事件类型。尽管许多作品以完全监督的方式成功地培训了具有高性能的培训模型，但它们仍然仅限于大量标记的数据和预定义类别范围的需求。为了克服这些缺点，这项工作首先策划了一个名为FSD-FS的新数据库，用于多标签几个音频分类。然后，它探讨了如何将音频分类法纳入几次学习。具体而言，这项工作提出了依赖标签的原型网络（LAD-Protonet）来利用标签之间的亲子关系。另外，它应用了分类学标签平滑技术来提高模型性能。实验表明，LAD-Protonet优于原始原型网络以及其他最先进的方法。此外，当与分类学标签平滑相结合时，其性能可以进一步提高。

Everyday sound recognition aims to infer types of sound events in audio streams. While many works succeeded in training models with high performance in a fully-supervised manner, they are still restricted to the demand of large quantities of labelled data and the range of predefined classes. To overcome these drawbacks, this work firstly curates a new database named FSD-FS for multi-label few-shot audio classification. It then explores how to incorporate audio taxonomy in few-shot learning. Specifically, this work proposes label-dependent prototypical networks (LaD-protonet) to exploit parent-children relationships between labels. Plus, it applies taxonomy-aware label smoothing techniques to boost model performance. Experiments demonstrate that LaD-protonet outperforms original prototypical networks as well as other state-of-the-art methods. Moreover, its performance can be further boosted when combined with taxonomy-aware label smoothing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题