论文标题
多变量贝叶斯变量选择,并应用于多特征遗传映射
Multivariate Bayesian variable selection with application to multi-trait genetic fine mapping
论文作者
论文摘要
可变选择在现代统计学习和科学发现中起着至关重要的作用。在过去的二十年中,已经开发了许多正则化和贝叶斯变量选择方法,用于可变选择,但是其中大多数方法都考虑仅选择一个响应的变量。随着如今收集的更多数据,通常分析同一研究的多个相关响应。现有的多元变量选择方法为所有响应选择变量,而无需考虑不同响应之间可能的异质性,即某些特征只能预测响应的子集,而不是其余的。由遗传学中的多特征精细映射问题鉴定多个相关性状的因果变异的动机,我们开发了一种新型的多元贝叶斯变量选择方法,以从大量的分组预测变量中选择关键的预测指标,这些预测因子以多个相关性和可能异质响应为目标。我们的新方法以多个层次的选择为特征,它纳入了先前的生物学知识来指导选择和识别最佳响应的最佳子集预测因子目标。我们通过广泛的模拟和一个真实的映射示例来展示我们方法的优势,以识别与不同成瘾行为相关的因果变体。
Variable selection has played a critical role in modern statistical learning and scientific discoveries. Numerous regularization and Bayesian variable selection methods have been developed in the past two decades for variable selection, but most of these methods consider selecting variables for only one response. As more data is being collected nowadays, it is common to analyze multiple related responses from the same study. Existing multivariate variable selection methods select variables for all responses without considering the possible heterogeneity across different responses, i.e. some features may only predict a subset of responses but not the rest. Motivated by the multi-trait fine mapping problem in genetics to identify the causal variants for multiple related traits, we developed a novel multivariate Bayesian variable selection method to select critical predictors from a large number of grouped predictors that target at multiple correlated and possibly heterogeneous responses. Our new method is featured by its selection at multiple levels, its incorporation of prior biological knowledge to guide selection and identification of best subset of responses predictors target at. We showed the advantage of our method via extensive simulations and a real fine mapping example to identify causal variants associated with different subsets of addictive behaviors.