具有概率编程的复杂基于坐标的荟萃分析

论文标题

具有概率编程的复杂基于坐标的荟萃分析

Complex Coordinate-Based Meta-Analysis with Probabilistic Programming

论文作者

Iovene, Valentin, Zanitti, Gaston, Wassermann, Demian

论文摘要

随着已发表的功能磁共振成像（fMRI）研究的数量越来越多，荟萃分析数据库和模型已成为大脑映射研究的组成部分。基于坐标的荟萃分析（CBMA）数据库是通过使用自然语言处理（NLP）技术自动提取报告的峰值激活和期限关联的两个坐标来构建的。在这些数据库上解决基于项的查询使得获得与特定认知过程有关的大脑统计图成为可能。但是，借助神经合成等工具，只有单例查询才能带来统计上可靠的结果。在解决更丰富的查询时，数据库中的研究很少有助于统计估计。我们设计了位于Datalog上的概率特定域语言（DSL）及其概率扩展之一，即CP-Logic，用于表达和求解基于富裕逻辑的查询。我们将CBMA数据库编码为概率程序。使用其贝叶斯网络翻译的联合分布，我们表明该程序上的查询解决方案计算了体素激活的正确概率分布。我们解释了最近提起的查询处理算法如何使扩展到大型神经成像数据的大小，在这种数据中，最新的知识汇编（KC）技术无法快速求解查询，以实现应用程序。最后，我们介绍了一种将研究与概率术语联系起来的方法，从而为较小的数据库提供了更好的解决方案的解决方案。我们在模拟的荟萃分析数据库和广泛使用的神经合成数据库上展示了两项连词查询的结果。

With the growing number of published functional magnetic resonance imaging (fMRI) studies, meta-analysis databases and models have become an integral part of brain mapping research. Coordinate-based meta-analysis (CBMA) databases are built by automatically extracting both coordinates of reported peak activations and term associations using natural language processing (NLP) techniques. Solving term-based queries on these databases make it possible to obtain statistical maps of the brain related to specific cognitive processes. However, with tools like Neurosynth, only singleterm queries lead to statistically reliable results. When solving richer queries, too few studies from the database contribute to the statistical estimations. We design a probabilistic domain-specific language (DSL) standing on Datalog and one of its probabilistic extensions, CP-Logic, for expressing and solving rich logic-based queries. We encode a CBMA database into a probabilistic program. Using the joint distribution of its Bayesian network translation, we show that solutions of queries on this program compute the right probability distributions of voxel activations. We explain how recent lifted query processing algorithms make it possible to scale to the size of large neuroimaging data, where state of the art knowledge compilation (KC) techniques fail to solve queries fast enough for practical applications. Finally, we introduce a method for relating studies to terms probabilistically, leading to better solutions for conjunctive queries on smaller databases. We demonstrate results for two-term conjunctive queries, both on simulated meta-analysis databases and on the widely-used Neurosynth database.

下载PDF全文

下载文献需遵守相关版权规定

论文标题