论文标题
扩展隐藏的马尔可夫语言模型
Scaling Hidden Markov Language Models
论文作者
论文摘要
隐藏的马尔可夫模型(HMM)是用于序列建模的基本工具,可将隐藏状态与发射结构分开。但是,这种分离使得很难将HMM拟合到现代NLP中的大型数据集中,并且由于性能与完全观察到的模型相比,由于性能非常差,它们已经失败了。这项工作重新审查了将HMMS缩放到语言建模数据集的挑战,从最近的方法将想法从神经建模中获取。我们提出了将HMMS缩放到大量状态空间的方法,同时保持有效的精确推断,紧凑的参数化和有效的正则化。实验表明,这种方法导致模型比以前的基于HMM和基于N-Gram的方法更准确,从而朝着最先进的神经模型的表现方面取得了进步。
The hidden Markov model (HMM) is a fundamental tool for sequence modeling that cleanly separates the hidden state from the emission structure. However, this separation makes it difficult to fit HMMs to large datasets in modern NLP, and they have fallen out of use due to very poor performance compared to fully observed models. This work revisits the challenge of scaling HMMs to language modeling datasets, taking ideas from recent approaches to neural modeling. We propose methods for scaling HMMs to massive state spaces while maintaining efficient exact inference, a compact parameterization, and effective regularization. Experiments show that this approach leads to models that are more accurate than previous HMM and n-gram-based methods, making progress towards the performance of state-of-the-art neural models.