对象 - 归类双簇，以消除缺血性卒中基因组数据中缺失的基因型

论文标题

对象 - 归类双簇，以消除缺血性卒中基因组数据中缺失的基因型

Object-Attribute Biclustering for Elimination of Missing Genotypes in Ischemic Stroke Genome-Wide Data

论文作者

Ignatov, Dmitry I., Khvorykh, Gennady V., Khrunin, Andrey V., Nikolić, Stefan, Shaban, Makhmud, Petrova, Elizaveta A., Koltsova, Evgeniya A., Takelait, Fouzi, Egurnov, Dmitrii

论文摘要

缺失的基因型会影响机器学习方法的功效，以识别常见疾病和特征的风险遗传变异。当从不同的DNA微阵列的不同实验中收集基因型数据时，就会发生问题，每片都以其未填充（缺失）基因型的模式为特征。这可以防止机器学习分类器正确分配类。为了解决这个问题，我们使用了与二进制关系$ \ textit {dertion} \ times \ times \ textit {snps} $相对应的对象属性双群和形式概念的发达概念。该论文包含实验结果，该结果将双簇算法应用于收集的大型现实世界数据集，用于研究缺血性中风的遗传基础。该算法可以在基因型基质中识别出大型密集的双群，以进行进一步的处理，作为回报，这大大提高了机器学习分类器的质量。所提出的算法还能够为整个数据集生成双晶布，而与生成正式概念的close4算法相比，没有大小约束。

Missing genotypes can affect the efficacy of machine learning approaches to identify the risk genetic variants of common diseases and traits. The problem occurs when genotypic data are collected from different experiments with different DNA microarrays, each being characterised by its pattern of uncalled (missing) genotypes. This can prevent the machine learning classifier from assigning the classes correctly. To tackle this issue, we used well-developed notions of object-attribute biclusters and formal concepts that correspond to dense subrelations in the binary relation $\textit{patients} \times \textit{SNPs}$. The paper contains experimental results on applying a biclustering algorithm to a large real-world dataset collected for studying the genetic bases of ischemic stroke. The algorithm could identify large dense biclusters in the genotypic matrix for further processing, which in return significantly improved the quality of machine learning classifiers. The proposed algorithm was also able to generate biclusters for the whole dataset without size constraints in comparison to the In-Close4 algorithm for generation of formal concepts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题