人工智能与神经科学中人工神经网络模型的解释性

论文标题

人工智能与神经科学中人工神经网络模型的解释性

Interpretability of artificial neural network models in artificial Intelligence vs. neuroscience

论文作者

Kar, Kohitij, Kornblith, Simon, Fedorenko, Evelina

论文摘要

从机器学习（ML）模型得出的大脑功能的计算明确假设最近彻底改变了神经科学。尽管这些人工神经网络（ANN）具有前所未有的能力，可以捕获生物神经网络中的响应（大脑），并且我们完全访问所有内部模型组件（与大脑不同），但ANN通常被称为具有有限解释性的黑盒。但是，可解释性是一种多面构建体，在各个字段中使用不同。特别是，人工智能（AI）的解释性或解释性的努力集中在了解不同模型组件如何促进其输出（即决策）上。相比之下，ANN的神经科学可解释性需要模型组件和神经科学构建体之间的明确比对（例如，不同的大脑区域或现象，例如复发或自上而下的反馈）。考虑到提高AI系统的可解释性的广泛呼吁，我们在这里强调了这些不同的解释性概念，并认为可以与AI中持续的努力同时追求ANN的神经科学性解释性。可以在两个字段中利用某些ML技术（例如，深梦想），以询问哪些刺激最佳地激活了特定模型特征（通过优化的特征可视化），或者不同的功能如何对模型的输出（功能归因）贡献。但是，如果没有适当的大脑对齐，某些特征将无法解释为神经科学家。

Computationally explicit hypotheses of brain function derived from machine learning (ML)-based models have recently revolutionized neuroscience. Despite the unprecedented ability of these artificial neural networks (ANNs) to capture responses in biological neural networks (brains), and our full access to all internal model components (unlike the brain), ANNs are often referred to as black-boxes with limited interpretability. Interpretability, however, is a multi-faceted construct that is used differently across fields. In particular, interpretability, or explainability, efforts in Artificial Intelligence (AI) focus on understanding how different model components contribute to its output (i.e., decision making). In contrast, the neuroscientific interpretability of ANNs requires explicit alignment between model components and neuroscientific constructs (e.g., different brain areas or phenomena, like recurrence or top-down feedback). Given the widespread calls to improve the interpretability of AI systems, we here highlight these different notions of interpretability and argue that the neuroscientific interpretability of ANNs can be pursued in parallel with, but independently from, the ongoing efforts in AI. Certain ML techniques (e.g., deep dream) can be leveraged in both fields, to ask what stimulus optimally activates the specific model features (feature visualization by optimization), or how different features contribute to the model's output (feature attribution). However, without appropriate brain alignment, certain features will remain uninterpretable to neuroscientists.

下载PDF全文

下载文献需遵守相关版权规定

论文标题