论文标题

改进了线性SVM的子采样的随机hadamard变换

Improved Subsampled Randomized Hadamard Transform for Linear SVM

论文作者

Lei, Zijian, Lan, Liang

论文摘要

在$ O(dlog(d)$ time中,可以有效地将$ d $维数据投射到$ r $二维的空间($ r \ ll d $)中,可以有效地将$ d $维数据投射到$ r $二维的空间($ r $ r \ ll d $)中,已广泛用于应对机器学习中高二二二二比的挑战。 SRHT通过旋转输入数据矩阵$ \ mathbf {x} \ in \ mathbb {r}^{n \ times d} $通过随机的walsh-hadamard变换,然后在旋转矩阵上进行随后的均匀列采样。尽管SRHT具有优势,但SRHT的一个局限性是它生成新的低维嵌入,而无需考虑给定数据集的任何特定属性。因此,当用于特定机器学习任务(例如分类)时,这种与数据无关的随机投影方法可能会导致劣质和不稳定的性能。为了克服这一限制,我们分析了在线性SVM分类的背景下使用SRHT进行随机投影的效果。基于我们的分析,我们提出了重要性抽样和确定性的顶部抽样,以产生有效的低维嵌入而不是均匀的采样SRHT。此外,我们还提出了一种新的监督不均匀采样方法。我们的实验结果表明,我们提出的方法可以达到比SRHT和六个现实生活数据集上的其他随机投影方法获得更高的分类精度。

Subsampled Randomized Hadamard Transform (SRHT), a popular random projection method that can efficiently project a $d$-dimensional data into $r$-dimensional space ($r \ll d$) in $O(dlog(d))$ time, has been widely used to address the challenge of high-dimensionality in machine learning. SRHT works by rotating the input data matrix $\mathbf{X} \in \mathbb{R}^{n \times d}$ by Randomized Walsh-Hadamard Transform followed with a subsequent uniform column sampling on the rotated matrix. Despite the advantages of SRHT, one limitation of SRHT is that it generates the new low-dimensional embedding without considering any specific properties of a given dataset. Therefore, this data-independent random projection method may result in inferior and unstable performance when used for a particular machine learning task, e.g., classification. To overcome this limitation, we analyze the effect of using SRHT for random projection in the context of linear SVM classification. Based on our analysis, we propose importance sampling and deterministic top-$r$ sampling to produce effective low-dimensional embedding instead of uniform sampling SRHT. In addition, we also proposed a new supervised non-uniform sampling method. Our experimental results have demonstrated that our proposed methods can achieve higher classification accuracies than SRHT and other random projection methods on six real-life datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源