论文标题

在双视数据中查找一组交叉相关特征

Finding Groups of Cross-Correlated Features in Bi-View Data

论文作者

Dewaskar, Miheer, Palowitch, John, He, Mark, Love, Michael I., Nobel, Andrew B.

论文摘要

在许多科学应用中,出现了从共同的样本中获得两种(或更多)类型的测量的数据集。此类数据的探索性分析中的一个常见问题是识别有密切相关的不同数据类型的特征组。双模模是来自两种数据类型的特征集的一对(a,b),因此A和B中的特征之间的汇总相关是很大。如果A与B中的特征显着相关的特征集合,则双模模(A,B)是稳定的,反之亦然。本文提出了基于迭代测试的双模模搜索程序(BSP),以识别稳定的双模型。 与检测交叉相关特征的现有方法相比,BSP是恢复具有足够信号的真实双模型的最佳方法,同时限制了错误的发现。此外,我们将BSP应用于使用GTEX联盟中数据的表达定量性状基因座(EQTL)分析的问题。 BSP确定了数千个SNP基因双模型。尽管通过标准eqtl方法鉴定出了许多出现在发现的双模型中的单个SNP基因对,但发现的双模模揭示了基因组子网络,这些子网似乎具有生物学上有意义,值得进一步的科学研究。

Datasets in which measurements of two (or more) types are obtained from a common set of samples arise in many scientific applications. A common problem in the exploratory analysis of such data is to identify groups of features of different data types that are strongly associated. A bimodule is a pair (A,B) of feature sets from two data types such that the aggregate cross-correlation between the features in A and those in B is large. A bimodule (A,B) is stable if A coincides with the set of features that have significant aggregate correlation with the features in B, and vice-versa. This paper proposes an iterative-testing based bimodule search procedure (BSP) to identify stable bimodules. Compared to existing methods for detecting cross-correlated features, BSP was the best at recovering true bimodules with sufficient signal, while limiting the false discoveries. In addition, we applied BSP to the problem of expression quantitative trait loci (eQTL) analysis using data from the GTEx consortium. BSP identified several thousand SNP-gene bimodules. While many of the individual SNP-gene pairs appearing in the discovered bimodules were identified by standard eQTL methods, the discovered bimodules revealed genomic subnetworks that appeared to be biologically meaningful and worthy of further scientific investigation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源