贝叶斯网络中的硬和软em从不完整的数据中学习

论文标题

贝叶斯网络中的硬和软em从不完整的数据中学习

Hard and Soft EM in Bayesian Network Learning from Incomplete Data

论文作者

Ruggieri, Andrea, Stranieri, Francesco, Stella, Fabio, Scutari, Marco

论文摘要

从临床试验到工业应用，不完整的数据是许多领域中的共同特征。贝叶斯网络（BN）由于其图形和因果解释而经常在这些领域中使用。 BN参数从不完整的数据中学习通常使用预期最大化算法（EM）实现，该算法（EM）使用信念传播计算相关的足够统计（“软EM”）。同样，结构性期望最大化算法（结构性EM）使用为完整数据设计的算法从那些足够的统计数据中学习了BN的网络结构。但是，参数和结构学习的实际实现通常会估算缺失的数据（“硬EM”）来计算足够的统计数据，而不是使用信念传播，以易于实施和计算速度。在本文中，我们调查了一个问题：使用插补而不是信念传播对产生的BN质量的影响是什么？从使用合成数据和参考BNS的仿真研究中，我们发现可以根据数据的特征在几种情况下推荐一种方法。然后，我们使用此信息来构建一个简单的决策树，以指导从业者选择最适合其问题的EM算法。

Incomplete data are a common feature in many domains, from clinical trials to industrial applications. Bayesian networks (BNs) are often used in these domains because of their graphical and causal interpretations. BN parameter learning from incomplete data is usually implemented with the Expectation-Maximisation algorithm (EM), which computes the relevant sufficient statistics ("soft EM") using belief propagation. Similarly, the Structural Expectation-Maximisation algorithm (Structural EM) learns the network structure of the BN from those sufficient statistics using algorithms designed for complete data. However, practical implementations of parameter and structure learning often impute missing data ("hard EM") to compute sufficient statistics instead of using belief propagation, for both ease of implementation and computational speed. In this paper, we investigate the question: what is the impact of using imputation instead of belief propagation on the quality of the resulting BNs? From a simulation study using synthetic data and reference BNs, we find that it is possible to recommend one approach over the other in several scenarios based on the characteristics of the data. We then use this information to build a simple decision tree to guide practitioners in choosing the EM algorithm best suited to their problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题