时空张量素描通过自适应抽样

论文标题

时空张量素描通过自适应抽样

Spatio-Temporal Tensor Sketching via Adaptive Sampling

论文作者

Ma, Jing, Zhang, Qiuchen, Ho, Joyce C., Xiong, Li

论文摘要

挖掘大量时空数据可以帮助各种现实的应用程序，例如城市能力计划，事件管理和社交网络分析。张量表示可以用于捕获空间和时间之间的相关性，并以无监督的方式同时利用空间和时间模式的潜在结构。但是，时空数据的量增加使使用张量分解的存储和分析非常昂贵。在本文中，我们提出了Sketensmooth，这是一种新型的张量分解框架，它使用自适应采样以时间流方式压缩张量并保留基本的全球结构。根据检测到的数据动力学，SketensMooth自适应样品传入的张量切片。因此，草图对张量动态模式更具代表性和信息性。此外，我们提出了一种强大的张量分解方法，该方法可以处理草图的张量并恢复原始模式。纽约市黄色出租车数据的实验表明，在保留基本模式方面，Sketensmooth大大降低了记忆成本，并优于随机抽样和固定费率采样方法。

Mining massive spatio-temporal data can help a variety of real-world applications such as city capacity planning, event management, and social network analysis. The tensor representation can be used to capture the correlation between space and time and simultaneously exploit the latent structure of the spatial and temporal patterns in an unsupervised fashion. However, the increasing volume of spatio-temporal data has made it prohibitively expensive to store and analyze using tensor factorization. In this paper, we propose SkeTenSmooth, a novel tensor factorization framework that uses adaptive sampling to compress the tensor in a temporally streaming fashion and preserves the underlying global structure. SkeTenSmooth adaptively samples incoming tensor slices according to the detected data dynamics. Thus, the sketches are more representative and informative of the tensor dynamic patterns. In addition, we propose a robust tensor factorization method that can deal with the sketched tensor and recover the original patterns. Experiments on the New York City Yellow Taxi data show that SkeTenSmooth greatly reduces the memory cost and outperforms random sampling and fixed rate sampling method in terms of retaining the underlying patterns.

下载PDF全文

下载文献需遵守相关版权规定

论文标题