使用多任务学习对声音事件和声学场景的联合分析

论文标题

使用多任务学习对声音事件和声学场景的联合分析

Joint Analysis of Sound Events and Acoustic Scenes Using Multitask Learning

论文作者

Tonami, Noriyuki, Imoto, Keisuke, Yamanishi, Ryosuke, Yamashita, Yoichi

论文摘要

声音事件检测（SED）和声学场景分类（ASC）是环境声音分析中的重要研究主题。许多研究小组使用基于神经网络的方法（例如卷积神经网络（CNN），经常性神经网络（RNN）和卷积复发神经网络（CRNN））来解决SED和ASC。即使声音事件和声学场景彼此密切相关，也会分别解决SED和ASC。例如，在声学场景“ Office”中，可能会发生声音事件“鼠标”和“键盘键入”。因此，可以预期，有关声音事件和声学场景的信息将对SED和ASC具有相互帮助。在本文中，我们提出了多任务学习，以共同对声音事件和声学场景进行联合分析，其中将共享有关声音事件和声学场景的信息的部分共享。使用TUT声音事件2016/2017和TUT声学场景2016数据集获得的实验结果表明，与常规CRNN的方法相比，提出的方法分别将SED和ASC的性能提高了1.31和1.80个百分点。

Sound event detection (SED) and acoustic scene classification (ASC) are important research topics in environmental sound analysis. Many research groups have addressed SED and ASC using neural-network-based methods, such as the convolutional neural network (CNN), recurrent neural network (RNN), and convolutional recurrent neural network (CRNN). The conventional methods address SED and ASC separately even though sound events and acoustic scenes are closely related to each other. For example, in the acoustic scene "office," the sound events "mouse clicking" and "keyboard typing" are likely to occur. Therefore, it is expected that information on sound events and acoustic scenes will be of mutual aid for SED and ASC. In this paper, we propose multitask learning for joint analysis of sound events and acoustic scenes, in which the parts of the networks holding information on sound events and acoustic scenes in common are shared. Experimental results obtained using the TUT Sound Events 2016/2017 and TUT Acoustic Scenes 2016 datasets indicate that the proposed method improves the performance of SED and ASC by 1.31 and 1.80 percentage points in terms of the F-score, respectively, compared with the conventional CRNN-based method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题