论文标题
稳健的时间序列链发现与递增的最近邻居
Robust Time Series Chain Discovery with Incremental Nearest Neighbors
论文作者
论文摘要
时间序列图案发现是确定时间序列中有意义的重复模式的基本任务。最近,引入了时间序列链作为时间序列基序的扩展,以识别时间序列数据中的连续发展模式。非正式地,时间序列链(TSC)是一组时间序列的时间序列子序列,其中每个子序列都类似于之前的时间序列,但最后一个和第一个可以任意不同。 TSC被证明能够揭示时间序列中的潜在连续发展趋势,并确定复杂系统中异常事件的前体。尽管它具有有希望的解释性,但不幸的是,我们已经观察到现有的TSC定义缺乏准确覆盖时间序列不断发展的部分的能力:发现的链条可以通过噪声轻松切割,并且可以包括非进化模式,从而使它们在现实世界中不切实际。受到最新工作的启发,该工作跟踪了时间序列子序列的最近邻居如何随时间变化,我们引入了一个新的TSC定义,该定义对数据中的噪声更为强大,从某种意义上说,它们可以更好地定位不断发展的模式,同时排除了非变化的模式。我们进一步提出了两个新的质量指标,以对发现的链条进行排名。通过广泛的经验评估,我们证明了所提出的TSC定义对噪声的定义比最新的状态明显更强,并且发现的最高排名的链条可以在各种现实世界的数据集中揭示有意义的规律性。
Time series motif discovery has been a fundamental task to identify meaningful repeated patterns in time series. Recently, time series chains were introduced as an expansion of time series motifs to identify the continuous evolving patterns in time series data. Informally, a time series chain (TSC) is a temporally ordered set of time series subsequences, in which every subsequence is similar to the one that precedes it, but the last and the first can be arbitrarily dissimilar. TSCs are shown to be able to reveal latent continuous evolving trends in the time series, and identify precursors of unusual events in complex systems. Despite its promising interpretability, unfortunately, we have observed that existing TSC definitions lack the ability to accurately cover the evolving part of a time series: the discovered chains can be easily cut by noise and can include non-evolving patterns, making them impractical in real-world applications. Inspired by a recent work that tracks how the nearest neighbor of a time series subsequence changes over time, we introduce a new TSC definition which is much more robust to noise in the data, in the sense that they can better locate the evolving patterns while excluding the non-evolving ones. We further propose two new quality metrics to rank the discovered chains. With extensive empirical evaluations, we demonstrate that the proposed TSC definition is significantly more robust to noise than the state of the art, and the top ranked chains discovered can reveal meaningful regularities in a variety of real world datasets.