论文标题
与多任务学习有关基因组学数据下游任务的多任务学习的耦合
Coupling Deep Imputation with Multitask Learning for Downstream Tasks on Genomics Data
论文作者
论文摘要
基因组学数据,例如RNA基因表达,甲基化和微RNA表达是各种临床预测任务的宝贵信息来源。例如,预测生存结果,癌症组织学类型和其他患者相关信息不仅可以使用临床数据,还可以使用分子数据。此外,将这些数据源一起使用,例如在多任务学习中可以提高性能。但是,实际上,在分析完整病例时,有许多缺少的数据点会导致患者人数明显降低,这在我们的环境中是指存在的所有方式。 在本文中,我们调查了如何使用深度学习和多任务学习的缺失值归纳数据可以有助于使用组合基因组学模式,RNA,微RNA和甲基化达到最先进的性能结果。我们提出了一种广义的深度插补方法,以估算值,而患者具有除一种方式以外的所有方式。有趣的是,深度插补的单独归因于大多数组合组合的分类和回归任务单独的多任务学习。相比之下,当使用所有模式进行生存预测时,我们会观察到单独的多任务学习优于仅具有统计学意义的深度插补(调整后的p值为0.03)。因此,在优化下游预测任务的性能时,两种方法都是互补的。
Genomics data such as RNA gene expression, methylation and micro RNA expression are valuable sources of information for various clinical predictive tasks. For example, predicting survival outcomes, cancer histology type and other patients' related information is possible using not only clinical data but molecular data as well. Moreover, using these data sources together, for example in multitask learning, can boost the performance. However, in practice, there are many missing data points which leads to significantly lower patient numbers when analysing full cases, which in our setting refers to all modalities being present. In this paper we investigate how imputing data with missing values using deep learning coupled with multitask learning can help to reach state-of-the-art performance results using combined genomics modalities, RNA, micro RNA and methylation. We propose a generalised deep imputation method to impute values where a patient has all modalities present except one. Interestingly enough, deep imputation alone outperforms multitask learning alone for the classification and regression tasks across most combinations of modalities. In contrast, when using all modalities for survival prediction we observe that multitask learning alone outperforms deep imputation alone with statistical significance (adjusted p-value 0.03). Thus, both approaches are complementary when optimising performance for downstream predictive tasks.