论文标题
通过对抗分布对高维数据中的异质统计模式进行建模:无监督的生成框架
Modeling Heterogeneous Statistical Patterns in High-dimensional Data by Adversarial Distributions: An Unsupervised Generative Framework
论文作者
论文摘要
由于收集标签是过时的且耗时的,因此在诸如欺诈检测之类的应用中首选无监督的方法。同时,这种应用通常需要在高维数据中对固有群集进行建模,这通常显示出异质的统计模式,因为不同簇的模式可能会出现在不同的维度中。现有方法建议在选定的维度上对数据簇进行建模,但是在全球范围内省略任何维度可能会损害某些群集的模式。为了解决上述问题,我们提出了一个名为FIRD的新型无监督的生成框架,该框架利用对抗分布来拟合和解开异质统计模式。在应用离散空间时,FIRD有效地将同步欺诈者与普通用户区分开。此外,与SOTA异常检测方法相比,FIRD还可以在异常检测数据集上提供出色的性能(平均AUC改善超过5%)。各种数据集上的重要实验结果验证了所提出的方法可以更好地对高维数据中的异质统计模式进行建模,并使下游应用程序受益。
Since the label collecting is prohibitive and time-consuming, unsupervised methods are preferred in applications such as fraud detection. Meanwhile, such applications usually require modeling the intrinsic clusters in high-dimensional data, which usually displays heterogeneous statistical patterns as the patterns of different clusters may appear in different dimensions. Existing methods propose to model the data clusters on selected dimensions, yet globally omitting any dimension may damage the pattern of certain clusters. To address the above issues, we propose a novel unsupervised generative framework called FIRD, which utilizes adversarial distributions to fit and disentangle the heterogeneous statistical patterns. When applying to discrete spaces, FIRD effectively distinguishes the synchronized fraudsters from normal users. Besides, FIRD also provides superior performance on anomaly detection datasets compared with SOTA anomaly detection methods (over 5% average AUC improvement). The significant experiment results on various datasets verify that the proposed method can better model the heterogeneous statistical patterns in high-dimensional data and benefit downstream applications.