光谱邻居加入了潜在树模型的重建

论文标题

光谱邻居加入了潜在树模型的重建

Spectral neighbor joining for reconstruction of latent tree models

论文作者

Jaffe, Ariel, Amsel, Noah, Aizenbud, Yariv, Nadler, Boaz, Chang, Joseph T., Kluger, Yuval

论文摘要

多个科学应用中的一个常见假设是，观察到的数据的分布可以通过潜在的树图形模型来建模。一个重要的例子是系统发育学，其中树对一组观察到的生物的进化谱系进行了建模。考虑到一组独立的对树叶叶子的随机变量的实现，一个关键的挑战是推断潜在的树拓扑。在这项工作中，我们开发了频谱邻域（SNJ），这是一种恢复潜在树图形模型结构的新方法。给定一个矩阵，该矩阵包含所有对观察到的变量之间相似性的度量，SNJ计算了观察到的变量组之间内聚力的光谱度量。我们证明SNJ是一致的，并得出了从估计的相似性矩阵中正确恢复的足够条件。将这种条件与量度集中的相似性矩阵结合在一起，我们绑定了以高概率恢复树所需的样品数量。我们通过广泛的模拟说明，与其他几种重建方法相比，SNJ需要更少的样品来准确地恢复大量叶子或长边缘的树木。

A common assumption in multiple scientific applications is that the distribution of observed data can be modeled by a latent tree graphical model. An important example is phylogenetics, where the tree models the evolutionary lineages of a set of observed organisms. Given a set of independent realizations of the random variables at the leaves of the tree, a key challenge is to infer the underlying tree topology. In this work we develop Spectral Neighbor Joining (SNJ), a novel method to recover the structure of latent tree graphical models. Given a matrix that contains a measure of similarity between all pairs of observed variables, SNJ computes a spectral measure of cohesion between groups of observed variables. We prove that SNJ is consistent, and derive a sufficient condition for correct tree recovery from an estimated similarity matrix. Combining this condition with a concentration of measure result on the similarity matrix, we bound the number of samples required to recover the tree with high probability. We illustrate via extensive simulations that in comparison to several other reconstruction methods, SNJ requires fewer samples to accurately recover trees with a large number of leaves or long edges.

下载PDF全文

下载文献需遵守相关版权规定

论文标题