深层网络作为逻辑电路：概括和解释

论文标题

深层网络作为逻辑电路：概括和解释

Deep Networks as Logical Circuits: Generalization and Interpretation

论文作者

Snyder, Christopher, Vishwanath, Sriram

论文摘要

深度神经网络（DNN）不仅是黑匣子模型，而且我们经常这样概念化它们。我们缺乏将输入与输出联系起来的机制的良好解释。因此，我们发现很难用人类的术语分析（1）网络学到了什么以及（2）网络是否学到了。我们将DNN离散分类映射介绍为输入中间（true/fals）分类器的逻辑（和/或）组合的层次分解。那些无法进一步分解的分类器（称为原子）是（可解释的）线性分类器。综上所述，我们获得了一个带有线性分类器输入的逻辑电路，该电路计算与DNN相同的标签。该电路在结构上与网络体系结构不类似，并且可能需要较少的参数，具体取决于权重的配置。在这些情况下，我们同时获得了一种解释和概括（对于原始DNN），并连接了两个在历史上分别研究的前部。与压缩技术不同，我们的表示是。我们通过在简单的，受控的设置中研究DNN来激发这种观点的实用性，尽管仅使用组合信息（例如，没有保证金信息），我们仍获得了卓越的概括范围。我们演示了如何在MNIST数据集上“打开黑匣子”。我们表明，学识渊博的内部逻辑计算对应于语义上有意义的（未标记）类别，这些类别允许DNN说明简单英语。我们通过解释，诊断和替换DNN的逻辑电路来改善已经训练的网络的概括。

Not only are Deep Neural Networks (DNNs) black box models, but also we frequently conceptualize them as such. We lack good interpretations of the mechanisms linking inputs to outputs. Therefore, we find it difficult to analyze in human-meaningful terms (1) what the network learned and (2) whether the network learned. We present a hierarchical decomposition of the DNN discrete classification map into logical (AND/OR) combinations of intermediate (True/False) classifiers of the input. Those classifiers that can not be further decomposed, called atoms, are (interpretable) linear classifiers. Taken together, we obtain a logical circuit with linear classifier inputs that computes the same label as the DNN. This circuit does not structurally resemble the network architecture, and it may require many fewer parameters, depending on the configuration of weights. In these cases, we obtain simultaneously an interpretation and generalization bound (for the original DNN), connecting two fronts which have historically been investigated separately. Unlike compression techniques, our representation is. We motivate the utility of this perspective by studying DNNs in simple, controlled settings, where we obtain superior generalization bounds despite using only combinatorial information (e.g. no margin information). We demonstrate how to "open the black box" on the MNIST dataset. We show that the learned, internal, logical computations correspond to semantically meaningful (unlabeled) categories that allow DNN descriptions in plain English. We improve the generalization of an already trained network by interpreting, diagnosing, and replacing components the logical circuit that is the DNN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题