论文标题
特殊用途硬件上的二进制矩阵分解
Binary matrix factorization on special purpose hardware
论文作者
论文摘要
数据挖掘中的许多基本问题可以简化为一个或多个NP-HARD组合优化问题。与使用通用计算机相比,量子和量子启发的硬件等新型技术的最新进展有望解决这些问题,但通常要求以特殊形式建模该问题,例如iSing或二进制的二进制二进制优化(QUBO)模型,以便利用这些设备。在这项工作中,我们关注重要的二进制矩阵分解(BMF)问题,该问题在数据挖掘中有许多应用。我们提出了两个BMF的QUBO公式。我们展示了如何轻松地将聚类约束纳入这些配方中。我们认为的特殊用途硬件在可以处理的变量数量上受到限制,这在分解大型矩阵时会提出挑战。我们提出了一种基于抽样的方法来克服这一挑战,从而使我们能够分解大型矩形矩阵。除了这些方法外,我们还提出了一种简单的基线算法,在某些情况下,它的表现优于我们更复杂的方法。我们对量子启发的互补金属 - 氧化物 - 氧化物 - 氧化物 - 氧化型(CMOS)Enealeer进行了富士通数字退火器的实验,包括基因表达数据,包括基因表达数据。这些实验表明,与竞争方法相比,我们的方法能够产生更准确的BMF。
Many fundamental problems in data mining can be reduced to one or more NP-hard combinatorial optimization problems. Recent advances in novel technologies such as quantum and quantum-inspired hardware promise a substantial speedup for solving these problems compared to when using general purpose computers but often require the problem to be modeled in a special form, such as an Ising or quadratic unconstrained binary optimization (QUBO) model, in order to take advantage of these devices. In this work, we focus on the important binary matrix factorization (BMF) problem which has many applications in data mining. We propose two QUBO formulations for BMF. We show how clustering constraints can easily be incorporated into these formulations. The special purpose hardware we consider is limited in the number of variables it can handle which presents a challenge when factorizing large matrices. We propose a sampling based approach to overcome this challenge, allowing us to factorize large rectangular matrices. In addition to these methods, we also propose a simple baseline algorithm which outperforms our more sophisticated methods in a few situations. We run experiments on the Fujitsu Digital Annealer, a quantum-inspired complementary metal-oxide-semiconductor (CMOS) annealer, on both synthetic and real data, including gene expression data. These experiments show that our approach is able to produce more accurate BMFs than competing methods.