论文标题
Beat Transformer:通过扩张的自我注意力脱落了节拍和沮丧的跟踪
Beat Transformer: Demixed Beat and Downbeat Tracking with Dilated Self-Attention
论文作者
论文摘要
我们提出了Beat Transformer,这是一种新颖的变压器编码器架构,用于关节节拍和下调跟踪。与以前仅基于音频混合物的光谱图进行轨迹的模型不同,我们的模型处理具有多个仪器通道的频谱图。这是受到人类从富裕音乐背景(例如和弦进步和仪器)感知的度量结构的启发。为此,我们开发了一个具有时间关注和仪器的注意力的变压器模型,以捕获深入的度量线索。此外,我们的模型采用了一种新颖的扩张自我发项机制,该机制仅具有线性复杂性实现强大的层次结构建模。实验表明,在未建立的版本上,脱节的节拍跟踪有了显着改善。此外,BEAT Transformer在TCN体系结构上的下调跟踪准确性方面最多可提高4%。我们进一步发现了一种可解释的关注模式,反映了我们对层次度量结构的理解。
We propose Beat Transformer, a novel Transformer encoder architecture for joint beat and downbeat tracking. Different from previous models that track beats solely based on the spectrogram of an audio mixture, our model deals with demixed spectrograms with multiple instrument channels. This is inspired by the fact that humans perceive metrical structures from richer musical contexts, such as chord progression and instrumentation. To this end, we develop a Transformer model with both time-wise attention and instrument-wise attention to capture deep-buried metrical cues. Moreover, our model adopts a novel dilated self-attention mechanism, which achieves powerful hierarchical modelling with only linear complexity. Experiments demonstrate a significant improvement in demixed beat tracking over the non-demixed version. Also, Beat Transformer achieves up to 4% point improvement in downbeat tracking accuracy over the TCN architectures. We further discover an interpretable attention pattern that mirrors our understanding of hierarchical metrical structures.