Musictm-dataset，用于乐谱，歌词和音乐音频中的联合代表性学习

论文标题

Musictm-dataset，用于乐谱，歌词和音乐音频中的联合代表性学习

MusicTM-Dataset for Joint Representation Learning among Sheet Music, Lyrics, and Musical Audio

论文作者

Zeng, Donghuo, Yu, Yi, Oyama, Keizo

论文摘要

这项工作提出了一个名为MusictM-Dataset的音乐数据集，该数据集用于提高不同类型的跨模式检索（CMR）的表示能力。小型音乐数据集（包括三种方式）可用于CMR的学习表示。要收集音乐数据集，我们将原始的音乐符号扩展到合成音频和生成的板音乐图像，并构建基于音乐符号的表符号图像，音频剪辑和音节呈现文本作为细粒度的对齐方式，以便可以利用MusictM-Dataset来接收多态数据的共享表示，以获得多态数据点。 Musictm-Dataset提出了三种模式，包括板音乐的形象，歌词和合成音频的文字，它们的表示形式由一些高级模型提取。在本文中，我们介绍了音乐数据集的背景，并表达了数据收集的过程。基于我们的数据集，我们获得了一些用于CMR任务的基本方法。 https：//github.com/ddddzeng/musictm-dataset可以访问musictm-dataset。

This work present a music dataset named MusicTM-Dataset, which is utilized in improving the representation learning ability of different types of cross-modal retrieval (CMR). Little large music dataset including three modalities is available for learning representations for CMR. To collect a music dataset, we expand the original musical notation to synthesize audio and generated sheet-music image, and build musical notation based sheet-music image, audio clip and syllable-denotation text as fine-grained alignment, such that the MusicTM-Dataset can be exploited to receive shared representation for multimodal data points. The MusicTM-Dataset presents 3 kinds of modalities, which consists of the image of sheet-music, the text of lyrics and synthesized audio, their representations are extracted by some advanced models. In this paper, we introduce the background of music dataset and express the process of our data collection. Based on our dataset, we achieve some basic methods for CMR tasks. The MusicTM-Dataset are accessible in https: //github.com/dddzeng/MusicTM-Dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题