论文标题
告密者:超出有效的变压器,用于长序列时间序列预测
Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting
论文作者
论文摘要
许多现实世界的应用都需要预测长序列时间序列,例如电力消耗计划。长序列时间序列预测(LSTF)需要模型的高预测能力,这是能够有效地捕获出口和输入之间精确的长距离依赖关系耦合的能力。最近的研究表明,变压器的潜力增加了预测能力。但是,变压器有几个严重的问题可以防止其直接适用于LSTF,包括二次时复杂性,高内存使用情况以及编码器decoder架构的固有限制。为了解决这些问题,我们为LSTF设计了一个有效的基于变压器的模型,名为Informer,具有三个独特的特征:(i)$ probsparse $自我发项机制,在时间复杂性和内存使用情况下实现了$ O(l \ log l)$,并且在序列的依赖性一致性上具有可比性的性能。 (ii)自我注意事项提炼通过将级联层输入减半,并有效地处理极端长输入序列,突出了人们的注意力。 (iii)生成样式解码器虽然在概念上很简单,但可以预测一个远程操作的长时间序列,而不是逐步的方式,从而极大地提高了长期预测的推理速度。在四个大规模数据集上进行的广泛实验表明,告密者明显胜过现有方法,并为LSTF问题提供了新的解决方案。
Many real-world applications require the prediction of long sequence time-series, such as electricity consumption planning. Long sequence time-series forecasting (LSTF) demands a high prediction capacity of the model, which is the ability to capture precise long-range dependency coupling between output and input efficiently. Recent studies have shown the potential of Transformer to increase the prediction capacity. However, there are several severe issues with Transformer that prevent it from being directly applicable to LSTF, including quadratic time complexity, high memory usage, and inherent limitation of the encoder-decoder architecture. To address these issues, we design an efficient transformer-based model for LSTF, named Informer, with three distinctive characteristics: (i) a $ProbSparse$ self-attention mechanism, which achieves $O(L \log L)$ in time complexity and memory usage, and has comparable performance on sequences' dependency alignment. (ii) the self-attention distilling highlights dominating attention by halving cascading layer input, and efficiently handles extreme long input sequences. (iii) the generative style decoder, while conceptually simple, predicts the long time-series sequences at one forward operation rather than a step-by-step way, which drastically improves the inference speed of long-sequence predictions. Extensive experiments on four large-scale datasets demonstrate that Informer significantly outperforms existing methods and provides a new solution to the LSTF problem.