说起来容易得多：伪造线性解码方法的因果关系

论文标题

说起来容易得多：伪造线性解码方法的因果关系

Much Easier Said Than Done: Falsifying the Causal Relevance of Linear Decoding Methods

论文作者

Hayne, Lucas, Suresh, Abhijit, Jain, Hunar, Kumar, Rahul, Carter, R. McKell

论文摘要

线性分类器探针经常用于更好地了解神经网络的功能。研究人员通过探测其学识渊博的内部表示来确定神经网络中的单位重要性的问题。线性分类器探针将高度选择性单元识别为网络功能最重要的单位。网络是否真的依赖于高选择性单元，可以通过使用消融将其从网络中删除。令人惊讶的是，当高度选择性的单位消融时，它们只会产生少量的性能赤字，甚至在某些情况下才产生。尽管缺乏选择性神经元的消融效应，但线性解码方法可以有效地解释网络功能，从而使其有效性成为一个谜。为了伪造选择性在网络函数中的独家作用并解决这一矛盾，我们在激活空间的子区域中系统地烧毁了单位组。在这里，我们发现通过探针识别的神经元与通过消融确定的神经元之间存在薄弱的关系。更具体地说，我们发现选择性与单位平均活性之间的相互作用更好地预测了Alexnet，VGG16，MobilenetV2和Resnet101中单元组的消融性能缺陷。线性解码器可能有些有效，因为它们与那些对网络功能至关重要的单位重叠。通过关注因果关系重要的单位，可以改善可解释性方法。

Linear classifier probes are frequently utilized to better understand how neural networks function. Researchers have approached the problem of determining unit importance in neural networks by probing their learned, internal representations. Linear classifier probes identify highly selective units as the most important for network function. Whether or not a network actually relies on high selectivity units can be tested by removing them from the network using ablation. Surprisingly, when highly selective units are ablated they only produce small performance deficits, and even then only in some cases. In spite of the absence of ablation effects for selective neurons, linear decoding methods can be effectively used to interpret network function, leaving their effectiveness a mystery. To falsify the exclusive role of selectivity in network function and resolve this contradiction, we systematically ablate groups of units in subregions of activation space. Here, we find a weak relationship between neurons identified by probes and those identified by ablation. More specifically, we find that an interaction between selectivity and the average activity of the unit better predicts ablation performance deficits for groups of units in AlexNet, VGG16, MobileNetV2, and ResNet101. Linear decoders are likely somewhat effective because they overlap with those units that are causally important for network function. Interpretability methods could be improved by focusing on causally important units.

下载PDF全文

下载文献需遵守相关版权规定

论文标题