通过神经晶格语言建模学习口语表示

论文标题

通过神经晶格语言建模学习口语表示

Learning Spoken Language Representations with Neural Lattice Language Modeling

论文作者

Huang, Chao-Wei, Chen, Yun-Nung

论文摘要

预训练的语言模型在许多NLP任务上都取得了巨大改进。但是，这些方法通常是为书面文本设计的，因此它们不考虑口语的属性。因此，本文旨在将预先培训的语言模型的概念推广到识别系统产生的晶格。我们提出了一个框架，该框架训练神经晶格语言模型，为语言理解任务提供上下文化表示。提出的两阶段训练方法减少了语音数据的需求，并具有更好的效率。关于意图检测和对话ACT识别数据集的实验表明，我们提出的方法在对口语输入进行评估时始终优于强基础。该代码可在https://github.com/miulab/lattice-elmo上找到。

Pre-trained language models have achieved huge improvement on many NLP tasks. However, these methods are usually designed for written text, so they do not consider the properties of spoken language. Therefore, this paper aims at generalizing the idea of language model pre-training to lattices generated by recognition systems. We propose a framework that trains neural lattice language models to provide contextualized representations for spoken language understanding tasks. The proposed two-stage pre-training approach reduces the demands of speech data and has better efficiency. Experiments on intent detection and dialogue act recognition datasets demonstrate that our proposed method consistently outperforms strong baselines when evaluated on spoken inputs. The code is available at https://github.com/MiuLab/Lattice-ELMo.

下载PDF全文

下载文献需遵守相关版权规定

论文标题