论文标题
使用常见模式检测的数据曲线聚类
Data Curves Clustering Using Common Patterns Detection
论文作者
论文摘要
在过去的几十年中,我们经历了人类产生的累积数据的巨大扩展。每天都有许多通常通过Internet互连的智能设备,可生产大量的真实价值数据集。时间序列代表来自金融,天气,医疗应用,交通控制等完全无关的领域的数据集,在人类日生活中变得越来越重要。分析和聚类这些时间序列,或者通常任何形式的曲线对于几种人类活动至关重要。在当前的论文中,引入了使用常见模式(3CP)方法的新曲线聚类,该方法采用了重复的模式检测算法,以根据其形状及其形状群序列及其形状群序列及时间序列,数据曲线和最终任何类型的离散序列之间的共同模式相似。为此,最长的预期重复模式减少后缀阵列(LERP-RSA)数据结构已与所有重复模式检测(ARPAD)算法结合使用,以便在数据曲线之间高度准确,有效地检测可用于集群目的的数据曲线之间的相似性,并提供额外的灵活性和功能。
For the past decades we have experienced an enormous expansion of the accumulated data that humanity produces. Daily a numerous number of smart devices, usually interconnected over internet, produce vast, real-values datasets. Time series representing datasets from completely irrelevant domains such as finance, weather, medical applications, traffic control etc. become more and more crucial in human day life. Analyzing and clustering these time series, or in general any kind of curves, could be critical for several human activities. In the current paper, the new Curves Clustering Using Common Patterns (3CP) methodology is introduced, which applies a repeated pattern detection algorithm in order to cluster sequences according to their shape and the similarities of common patterns between time series, data curves and eventually any kind of discrete sequences. For this purpose, the Longest Expected Repeated Pattern Reduced Suffix Array (LERP-RSA) data structure has been used in combination with the All Repeated Patterns Detection (ARPaD) algorithm in order to perform highly accurate and efficient detection of similarities among data curves that can be used for clustering purposes and which also provides additional flexibility and features.