高维稀疏的藤态回归，并应用于基因组预测

论文标题

高维稀疏的藤态回归，并应用于基因组预测

High-dimensional sparse vine copula regression with application to genomic prediction

论文作者

Sahin, Özge, Czado, Claudia

论文摘要

高维数据集通常在支持基因组的预测中可用。此类数据集包括与复杂依赖性结构的非线性关系。对于这种情况，基于藤蔓的基于葡萄藤的回归是重要的工具。但是，当前的基于葡萄藤的回归方法不会扩展到高和超高的尺寸。为了执行基于稀疏的葡萄藤回归，我们提出了两种方法。首先，我们展示了它们在计算复杂性上的优越性，而不是现有方法。其次，我们为分位数回归定义了相关，无关紧要的和冗余的解释变量。然后，我们通过仿真研究显示了方法在选择相关变量和预测准确性方面的能力。接下来，我们将提出的方法应用于高维真实数据，旨在针对玉米特征的基因组预测。进一步讨论了一些数据处理和针对真实数据的特征提取步骤。最后，我们在模拟研究和实际数据应用中显示了方法比线性模型和分位数回归森林的优势。

High-dimensional data sets are often available in genome-enabled predictions. Such data sets include nonlinear relationships with complex dependence structures. For such situations, vine copula based (quantile) regression is an important tool. However, the current vine copula based regression approaches do not scale up to high and ultra-high dimensions. To perform high-dimensional sparse vine copula based regression, we propose two methods. First, we show their superiority regarding computational complexity over the existing methods. Second, we define relevant, irrelevant, and redundant explanatory variables for quantile regression. Then we show our method's power in selecting relevant variables and prediction accuracy in high-dimensional sparse data sets via simulation studies. Next, we apply the proposed methods to the high-dimensional real data, aiming at the genomic prediction of maize traits. Some data-processing and feature extraction steps for the real data are further discussed. Finally, we show the advantage of our methods over linear models and quantile regression forests in simulation studies and real data applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题