论文标题
从流形的快速和异常感知的代表性选择的多标准方法
A Multi-criteria Approach for Fast and Outlier-aware Representative Selection from Manifolds
论文作者
论文摘要
代表性选择的问题相当于从大型数据集中取得很少的信息示例。本文介绍了Mosaic,这是一种新型的代表性选择方法,可以从可能表现出非线性结构的高维数据中。我们的方法基于一种新颖的二次配方,提高了一种多标准选择方法,该方法可以最大程度地提高采样子集的全球表示能力,确保多样性并通过有效检测异常值来拒绝颠覆性信息。通过理论分析,我们表征了所获得的草图,并揭示了采样代表在转化的空间中最大限度地定义了数据覆盖的概念。此外,我们提出了提出的算法的高度可扩展的随机实现,该算法显示出大量的加速。马赛克在实现代表性子集的所需特征方面的优越性,同时通过对真实和合成数据进行的广泛实验证明了对各种异常类型的显着鲁棒性,并与最先进的算法进行了比较。
The problem of representative selection amounts to sampling few informative exemplars from large datasets. This paper presents MOSAIC, a novel representative selection approach from high-dimensional data that may exhibit non-linear structures. Resting upon a novel quadratic formulation, Our method advances a multi-criteria selection approach that maximizes the global representation power of the sampled subset, ensures diversity, and rejects disruptive information by effectively detecting outliers. Through theoretical analyses we characterize the obtained sketch and reveal that the sampled representatives maximize a well-defined notion of data coverage in a transformed space. In addition, we present a highly scalable randomized implementation of the proposed algorithm shown to bring about substantial speedups. MOSAIC's superiority in achieving the desired characteristics of a representative subset all at once while exhibiting remarkable robustness to various outlier types is demonstrated via extensive experiments conducted on both real and synthetic data with comparisons to state-of-the-art algorithms.