论文标题
QR分解,SVD和PCA的Givens旋转与数据库相连
Givens Rotations for QR Decomposition, SVD and PCA over Database Joins
论文作者
论文摘要
本文介绍了Figaro,这是一种用于计算自然连接在关系数据上定义的基质QR分解中的上三角矩阵的算法。 Figaro的主要新颖之处在于,它将QR分解推向了加入。这导致了几种理想的属性。对于无环连接,它需要数据库大小的时间性时间,并且独立于联接大小。其执行等同于应用与联接大小成比例的givens旋转序列的应用。它相对于经典QR分解算法的舍入误差数与数据库大小相对于联接输出大小而言是相当的。 QR分解位于许多线性代数计算的核心,包括奇异值分解(SVD)和主要成分分析(PCA)。我们展示了如何使用Figaro来计算QR分解,SVD和JOIN输出的PCA中的正交矩阵,而无需实现连接输出。 一系列实验验证了Figaro可以在运行时性能和数值准确性方面均优于Lapack库Intel MKL,其因子与联接输出和输入大小之间的差距成正比。
This article introduces Figaro, an algorithm for computing the upper-triangular matrix in the QR decomposition of the matrix defined by the natural join over relational data. Figaro's main novelty is that it pushes the QR decomposition past the join. This leads to several desirable properties. For acyclic joins, it takes time linear in the database size and independent of the join size. Its execution is equivalent to the application of a sequence of Givens rotations proportional to the join size. Its number of rounding errors relative to the classical QR decomposition algorithms is on par with the database size relative to the join output size. The QR decomposition lies at the core of many linear algebra computations including the singular value decomposition (SVD) and the principal component analysis (PCA). We show how Figaro can be used to compute the orthogonal matrix in the QR decomposition, the SVD and the PCA of the join output without the need to materialize the join output. A suite of experiments validate that Figaro can outperform both in runtime performance and numerical accuracy the LAPACK library Intel MKL by a factor proportional to the gap between the sizes of the join output and input.