用于监督图像分类的纠缠和张量网络

论文标题

用于监督图像分类的纠缠和张量网络

Entanglement and Tensor Networks for Supervised Image Classification

论文作者

Martyn, John, Vidal, Guifre, Roberts, Chase, Leichenauer, Stefan

论文摘要

张量网络最初旨在解决量子多体物理学中的计算问题，最近已应用于机器学习任务。但是，与量子物理学相比，在过去的30年中，张量网络方法成功的原因是充分了解的，尚未了解这些技术为何适用于机器学习。本文的目的是调查当前机器学习应用程序中张量网络模型的纠缠属性，以发现可能指导未来发展的一般原则。我们重新访问使用张量网络对使用手写数字的MNIST数据集进行监督的图像分类，这是由Stoudenmire和Schwab [adv。在神经中。通知。 Proc。系统。 29，4799（2016）]。首先，我们假设张量网络在培训期间可能正在学习哪种状态。为此，我们提出了一个合理的候选状态$ |σ_ {\ ell} \ rangle $（作为对应于训练集中图像的产品状态的叠加构建）并研究其纠缠属性。我们得出结论，$ |σ_{\ ell} \ rangle $是如此牢固地纠缠，以至于该工作中使用的张量网络无法近似，因此必须代表一个非常不同的状态。其次，我们使用具有块产品结构的张量网络，其中纠缠在$ n \ times n $ pixels/qubits的小块中受到限制。我们发现这些状态具有极高的表现力（例如，培训准确性为$ 99.97 \％$ $ n = 2 $），这表明远距离纠缠对于图像分类可能不是必不可少的。但是，在我们当前的实施中，优化导致过度合适，从而导致与其他当前方法不竞争的测试精确度。

Tensor networks, originally designed to address computational problems in quantum many-body physics, have recently been applied to machine learning tasks. However, compared to quantum physics, where the reasons for the success of tensor network approaches over the last 30 years is well understood, very little is yet known about why these techniques work for machine learning. The goal of this paper is to investigate entanglement properties of tensor network models in a current machine learning application, in order to uncover general principles that may guide future developments. We revisit the use of tensor networks for supervised image classification using the MNIST data set of handwritten digits, as pioneered by Stoudenmire and Schwab [Adv. in Neur. Inform. Proc. Sys. 29, 4799 (2016)]. Firstly we hypothesize about which state the tensor network might be learning during training. For that purpose, we propose a plausible candidate state $|Σ_{\ell}\rangle$ (built as a superposition of product states corresponding to images in the training set) and investigate its entanglement properties. We conclude that $|Σ_{\ell}\rangle$ is so robustly entangled that it cannot be approximated by the tensor network used in that work, which must therefore be representing a very different state. Secondly, we use tensor networks with a block product structure, in which entanglement is restricted within small blocks of $n \times n$ pixels/qubits. We find that these states are extremely expressive (e.g. training accuracy of $99.97 \%$ already for $n=2$), suggesting that long-range entanglement may not be essential for image classification. However, in our current implementation, optimization leads to over-fitting, resulting in test accuracies that are not competitive with other current approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题