论文标题
通过稀疏性的离散和结构性潜在变量有效边缘化
Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity
论文作者
论文摘要
由于需要在大型或组合组合上边缘化,因此具有离散(分类或结构化)潜在变量具有离散(分类或结构化)潜在变量的培训神经网络模型可能具有挑战性。为了绕过这个问题,通常诉诸于真实边缘的基于抽样的近似值,需要噪声梯度估计器(例如得分函数估计器)或具有较低变化的重新聚集梯度(例如Gumbel-Softmax)的连续放松。在本文中,我们提出了一种新的培训策略,该策略用精确但有效的边缘化取代了这些估计量。为了实现这一目标,我们使用可区分的稀疏映射(Sparsemax及其结构化对应物)在潜在作业上参数化离散分布。实际上,这些分布的支持大大降低了,这可以有效地边缘化。我们报告了涵盖一系列潜在变量建模应用程序的三个任务中的成功结果:半私人的深层生成模型,潜在通信游戏和具有比特矢量潜在表示的生成模型。在所有情况下,我们都能获得良好的性能,同时仍能实现基于抽样的近似值的实用性。
Training neural network models with discrete (categorical or structured) latent variables can be computationally challenging, due to the need for marginalization over large or combinatorial sets. To circumvent this issue, one typically resorts to sampling-based approximations of the true marginal, requiring noisy gradient estimators (e.g., score function estimator) or continuous relaxations with lower-variance reparameterized gradients (e.g., Gumbel-Softmax). In this paper, we propose a new training strategy which replaces these estimators by an exact yet efficient marginalization. To achieve this, we parameterize discrete distributions over latent assignments using differentiable sparse mappings: sparsemax and its structured counterparts. In effect, the support of these distributions is greatly reduced, which enables efficient marginalization. We report successful results in three tasks covering a range of latent variable modeling applications: a semisupervised deep generative model, a latent communication game, and a generative model with a bit-vector latent representation. In all cases, we obtain good performance while still achieving the practicality of sampling-based approximations.