论文标题

用于任意混合数据的高维无向图形模型

High-Dimensional Undirected Graphical Models for Arbitrary Mixed Data

论文作者

Göbler, Konstantin, Miloschewski, Anne, Drton, Mathias, Mukherjee, Sach

论文摘要

图形模型是探索复杂多元数据中变量之间关系的重要工具。在所有变量都是连续或离散的情况下,包括在高维度中,学习这种图形模型的方法是良好开发的。但是,在许多应用中,数据涵盖了不同类型的变量(例如,连续,计数,二进制,序数等),其原则上的关节分析是非平凡的。潜在的高斯副群模型,其中所有变量均被建模为基础共同高斯变量的转换,代表了一种有用的方法。最近的进步表明,如何解决二进制二进制案例,但一般混合变量类型制度仍然具有挑战性。在这项工作中,我们做出了一个简单而有用的观察结果,即可以在潜在的高斯copula框架中利用有关多choric和多层相关性的经典思想。在此观察结果的基础上,我们建议使用具有完全一般混合类型的变量的数据进行灵活,可扩展的方法。我们通过广泛的模拟在理论上和经验上研究了这种方法的关键特性,以及对英国生物库的数据的说明性应用于COVID-19的风险因素。

Graphical models are an important tool in exploring relationships between variables in complex, multivariate data. Methods for learning such graphical models are well developed in the case where all variables are either continuous or discrete, including in high-dimensions. However, in many applications data span variables of different types (e.g. continuous, count, binary, ordinal, etc.), whose principled joint analysis is nontrivial. Latent Gaussian copula models, in which all variables are modeled as transformations of underlying jointly Gaussian variables, represent a useful approach. Recent advances have shown how the binary-continuous case can be tackled, but the general mixed variable type regime remains challenging. In this work, we make the simple yet useful observation that classical ideas concerning polychoric and polyserial correlations can be leveraged in a latent Gaussian copula framework. Building on this observation we propose flexible and scalable methodology for data with variables of entirely general mixed type. We study the key properties of the approaches theoretically and empirically, via extensive simulations as well an illustrative application to data from the UK Biobank concerning COVID-19 risk factors.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源