语义大脑解码：从fMRI到概念上相似的视觉刺激图像重建

论文标题

语义大脑解码：从fMRI到概念上相似的视觉刺激图像重建

Semantic Brain Decoding: from fMRI to conceptually similar image reconstruction of visual stimuli

论文作者

Ferrante, Matteo, Boccato, Tommaso, Toschi, Nicola

论文摘要

大脑解码是一个计算神经科学领域，它使用可测量的大脑活动来推断精神状态或感知输入的内部表示。因此，我们提出了一种新型的大脑解码方法，该方法也依赖于语义和上下文相似性。我们采用了自然图像视觉的fMRI数据集，并创建了一个深度学习解码管道，该管道受到人类视觉中自下而上和自上而下的过程的启发。我们训练一个线性的大脑对功能模型将fMRI活动特征映射到视觉刺激特征，假设大脑将视觉信息投射到一个由预告片的卷积神经网络的最后卷积层代表的空间的空间上，该空间通常会收集各种语义特征，从而收集了概念和概念之间的各种语义特征。然后，使用最近的邻居策略将这些特征分为潜在空间，并将结果用于调节生成潜在扩散模型以创建新型图像。仅从fMRI数据中，我们会产生视觉刺激的重建，这些刺激在语义层面上非常匹配原始内容，超过了先前文献中的最新状态。我们评估我们的工作并使用定量语义指标（WordNet词典上的Wu-Palmer相似性度量，平均值为0.57），并执行人类评估实验，根据人类标准在评估图像相似性的80％以上，根据人类标准的多重性，该实验导致了正确的评估。

Brain decoding is a field of computational neuroscience that uses measurable brain activity to infer mental states or internal representations of perceptual inputs. Therefore, we propose a novel approach to brain decoding that also relies on semantic and contextual similarity. We employ an fMRI dataset of natural image vision and create a deep learning decoding pipeline inspired by the existence of both bottom-up and top-down processes in human vision. We train a linear brain-to-feature model to map fMRI activity features to visual stimuli features, assuming that the brain projects visual information onto a space that is homeomorphic to the latent space represented by the last convolutional layer of a pretrained convolutional neural network, which typically collects a variety of semantic features that summarize and highlight similarities and differences between concepts. These features are then categorized in the latent space using a nearest-neighbor strategy, and the results are used to condition a generative latent diffusion model to create novel images. From fMRI data only, we produce reconstructions of visual stimuli that match the original content very well on a semantic level, surpassing the state of the art in previous literature. We evaluate our work and obtain good results using a quantitative semantic metric (the Wu-Palmer similarity metric over the WordNet lexicon, which had an average value of 0.57) and perform a human evaluation experiment that resulted in correct evaluation, according to the multiplicity of human criteria in evaluating image similarity, in over 80% of the test set.

下载PDF全文

下载文献需遵守相关版权规定

论文标题